assistance-engine/docs/API_REFERENCE.md

15 KiB
Raw Blame History

Brunix Assistance Engine — API Reference

Protocol: gRPC (proto3) Port: 50052 (host) → 50051 (container) Reflection: Enabled — service introspection available via grpcurl Source of truth: Docker/protos/brunix.proto


Table of Contents

  1. Service Definition
  2. Methods
  3. Message Types
  4. Error Handling
  5. Client Examples
  6. OpenAI-Compatible Proxy

1. Service Definition

package brunix;

service AssistanceEngine {
  rpc AskAgent       (AgentRequest) returns (stream AgentResponse);
  rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
  rpc EvaluateRAG    (EvalRequest)  returns (EvalResponse);
}

Both AskAgent and AskAgentStream return a server-side stream of AgentResponse messages. They differ in how they produce and deliver the response — see §2.1 and §2.2.


2. Methods

2.1 AskAgent

Behaviour: Runs the full LangGraph pipeline (classify → reformulate → retrieve → generate) using llm.invoke(). Returns the complete answer as a single AgentResponse message with is_final = true.

Use case: Clients that do not support streaming or need a single atomic response.

Request: See AgentRequest in §3.

Response stream:

Message # text avap_code is_final
1 (only) Full answer text "AVAP-2026" true

Latency characteristics: Depends on LLM generation time (non-streaming). Typically 315 seconds for qwen2.5:1.5b on the Devaron cluster.


2.2 AskAgentStream

Behaviour: Runs prepare_graph (classify → reformulate → retrieve), then calls llm.stream() directly. Emits one AgentResponse per token from Ollama, followed by a terminal message.

Use case: Interactive clients (chat UIs, VS Code extension) that need progressive rendering.

Request: Same AgentRequest as AskAgent.

Response stream:

Message # text avap_code is_final
1…N Single token "" false
N+1 (final) "" "" true

Client contract:

  • Accumulate text from all messages where is_final == false to reconstruct the full answer.
  • The is_final == true message signals end-of-stream. Its text is always empty and should be discarded.
  • Do not close the stream early — the engine will fail to persist conversation history if the stream is interrupted.

2.3 EvaluateRAG

Behaviour: Runs the RAGAS evaluation pipeline against the golden dataset. Uses the production Ollama LLM for answer generation and Claude as the evaluation judge.

Requirement: ANTHROPIC_API_KEY must be configured in the environment. This endpoint will return an error response if it is missing.

Request:

message EvalRequest {
  string category = 1;  // Optional. Filter golden dataset by category name.
                         // If empty, all categories are evaluated.
  int32  limit    = 2;  // Optional. Evaluate only the first N questions.
                         // If 0, all matching questions are evaluated.
  string index    = 3;  // Optional. Elasticsearch index to evaluate against.
                         // If empty, uses the server's configured ELASTICSEARCH_INDEX.
}

Response (single, non-streaming):

message EvalResponse {
  string status               = 1;  // "ok" or error description
  int32  questions_evaluated  = 2;  // Number of questions actually processed
  float  elapsed_seconds      = 3;  // Total wall-clock time
  string judge_model          = 4;  // Claude model used as judge
  string index                = 5;  // Elasticsearch index evaluated

  // RAGAS metric scores (0.0  1.0)
  float  faithfulness         = 6;
  float  answer_relevancy     = 7;
  float  context_recall       = 8;
  float  context_precision    = 9;

  float  global_score         = 10; // Mean of non-zero metric scores
  string verdict              = 11; // "EXCELLENT" | "ACCEPTABLE" | "INSUFFICIENT"

  repeated QuestionDetail details = 12;
}

message QuestionDetail {
  string id             = 1;  // Question ID from golden dataset
  string category       = 2;  // Question category
  string question       = 3;  // Question text
  string answer_preview = 4;  // First 300 chars of generated answer
  int32  n_chunks       = 5;  // Number of context chunks retrieved
}

Verdict thresholds:

Score Verdict
≥ 0.80 EXCELLENT
≥ 0.60 ACCEPTABLE
< 0.60 INSUFFICIENT

3. Message Types

AgentRequest

message AgentRequest {
  string query          = 1;
  string session_id     = 2;
  string editor_content = 3;
  string selected_text  = 4;
  string extra_context  = 5;
  string user_info      = 6;
}
Field Type Required Encoding Description
query string Yes Plain text User's natural language question. Max recommended: 4096 chars.
session_id string No Plain text Conversation identifier for multi-turn context. Use a stable UUID per user session. Defaults to "default" if empty.
editor_content string No Base64 Full content of the active file open in the editor at query time. Decoded server-side before entering the graph.
selected_text string No Base64 Text currently selected in the editor. Primary anchor for query reformulation and generation when the classifier detects an explicit editor reference.
extra_context string No Base64 Free-form additional context (e.g. file path, language identifier, open diagnostic errors).
user_info string No JSON string Client identity metadata. Expected format: {"dev_id": <int>, "project_id": <int>, "org_id": <int>}. Available in graph state for future routing or personalisation — not yet consumed by the graph.

Editor context behaviour:

Fields 36 are all optional. If none are provided the assistant behaves exactly as without them — full backward compatibility. When editor_content or selected_text are provided, the graph classifier determines whether the user's question explicitly refers to that code. Only if the classifier returns EDITOR are the context fields injected into the generation prompt. This prevents the model from referencing editor code when the question is unrelated to it.

Base64 encoding:

editor_content, selected_text and extra_context must be Base64-encoded before sending. The server decodes them with UTF-8. Malformed Base64 is silently treated as empty string — no error is raised.

import base64
encoded = base64.b64encode(content.encode("utf-8")).decode("utf-8")

AgentResponse

Field Type Description
text string Token text (streaming) or full answer text (non-streaming)
avap_code string Currently always "AVAP-2026" in non-streaming mode, empty in streaming
is_final bool true only on the last message of the stream

EvalRequest

Field Type Required Default Description
category string No "" (all) Filter golden dataset by category
limit int32 No 0 (all) Max questions to evaluate
index string No $ELASTICSEARCH_INDEX ES index to evaluate

EvalResponse

See full definition in §2.3.


4. Error Handling

The engine catches all exceptions and returns them as terminal AgentResponse messages rather than gRPC status errors. This means:

  • The stream will not be terminated with a non-OK gRPC status code on application-level errors.
  • Check for error strings in the text field that begin with [ENG] Error:.
  • The stream will still end with is_final = true.

Example error response:

{"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true}

EvaluateRAG error response: Returned as a single EvalResponse with status set to the error description:

{"status": "ANTHROPIC_API_KEY no configurada en .env", ...}

5. Client Examples

Introspect the service

grpcurl -plaintext localhost:50052 list
# Output: brunix.AssistanceEngine

grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine

AskAgent — basic query

grpcurl -plaintext \
  -d '{"query": "Que significa AVAP?", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgent

Expected response:

{
  "text": "AVAP (Advanced Virtual API Programming) es un DSL Turing Completo...",
  "avap_code": "AVAP-2026",
  "is_final": true
}

AskAgent — with editor context

import base64, json, grpc
import brunix_pb2, brunix_pb2_grpc

def encode(text: str) -> str:
    return base64.b64encode(text.encode("utf-8")).decode("utf-8")

channel = grpc.insecure_channel("localhost:50052")
stub    = brunix_pb2_grpc.AssistanceEngineStub(channel)

editor_code = """
try()
    ormDirect("UPDATE users SET active=1", res)
exception(e)
    addVar(_status, 500)
    addResult("Error")
end()
"""

request = brunix_pb2.AgentRequest(
    query          = "why is this not catching the error?",
    session_id     = "dev-001",
    editor_content = encode(editor_code),
    selected_text  = encode(editor_code),   # same block selected
    extra_context  = encode("file: handler.avap"),
    user_info      = json.dumps({"dev_id": 1, "project_id": 2, "org_id": 3}),
)

for response in stub.AskAgent(request):
    if response.is_final:
        print(response.text)

AskAgentStream — token streaming

grpcurl -plaintext \
  -d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgentStream

Expected response (truncated):

{"text": "Here", "is_final": false}
{"text": " is", "is_final": false}
{"text": " a", "is_final": false}
...
{"text": "", "is_final": true}

EvaluateRAG — run evaluation

grpcurl -plaintext \
  -d '{"category": "core_syntax", "limit": 10}' \
  localhost:50052 \
  brunix.AssistanceEngine/EvaluateRAG

Expected response:

{
  "status": "ok",
  "questions_evaluated": 10,
  "elapsed_seconds": 142.3,
  "judge_model": "claude-sonnet-4-20250514",
  "index": "avap-knowledge-v1",
  "faithfulness": 0.8421,
  "answer_relevancy": 0.7913,
  "context_recall": 0.7234,
  "context_precision": 0.6891,
  "global_score": 0.7615,
  "verdict": "ACCEPTABLE",
  "details": [...]
}

Multi-turn conversation

# Turn 1
grpcurl -plaintext \
  -d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream

# Turn 2 — engine has history from Turn 1
grpcurl -plaintext \
  -d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream

Regenerate gRPC stubs after modifying brunix.proto

python -m grpc_tools.protoc \
  -I./Docker/protos \
  --python_out=./Docker/src \
  --grpc_python_out=./Docker/src \
  ./Docker/protos/brunix.proto

6. OpenAI-Compatible Proxy

The container also exposes an HTTP server on port 8000 (openai_proxy.py) that wraps the gRPC interface under an OpenAI-compatible API. This allows integration with any tool that supports the OpenAI Chat Completions API — continue.dev, LiteLLM, Open WebUI, or any custom client.

Base URL: http://localhost:8000

Available endpoints

Method Endpoint Description
POST /v1/chat/completions OpenAI Chat Completions. Routes to AskAgent or AskAgentStream.
POST /v1/completions OpenAI Completions (legacy).
GET /v1/models Lists available models. Returns brunix.
POST /api/chat Ollama chat format (NDJSON streaming).
POST /api/generate Ollama generate format (NDJSON streaming).
GET /api/tags Ollama model list.
GET /health Health check. Returns {"status": "ok"}.

POST /v1/chat/completions

Routing: stream: falseAskAgent (single response). stream: trueAskAgentStream (SSE token stream).

Request body:

{
  "model": "brunix",
  "messages": [
    {"role": "user", "content": "Que significa AVAP?"}
  ],
  "stream": false,
  "session_id": "uuid-per-conversation",
  "user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
}

The user field (editor context transport):

The standard OpenAI user field is used to transport editor context as a JSON string. This allows the VS Code extension to send context without requiring API changes. Non-Brunix clients can omit user or set it to a plain string — both are handled gracefully.

Key in user JSON Encoding Description
editor_content Base64 Full content of the active editor file
selected_text Base64 Currently selected text in the editor
extra_context Base64 Free-form additional context
user_info JSON object {"dev_id": int, "project_id": int, "org_id": int}

Important: session_id must be sent as a top-level field — never inside the user JSON. The proxy reads session_id exclusively from the dedicated field.

Example — general query (no editor context):

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "Que significa AVAP?"}],
    "stream": false,
    "session_id": "test-001"
  }'

Example — query with editor context (VS Code extension):

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "que hace este codigo?"}],
    "stream": true,
    "session_id": "test-001",
    "user": "{\"editor_content\":\"<base64>\",\"selected_text\":\"<base64>\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
  }'

Example — empty editor context fields:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "como funciona addVar?"}],
    "stream": false,
    "session_id": "test-002",
    "user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{}}"
  }'