15 KiB

Raw Blame History

Brunix Assistance Engine — API Reference

Protocol: gRPC (proto3) Port: 50052 (host) → 50051 (container) Reflection: Enabled — service introspection available via grpcurl Source of truth: Docker/protos/brunix.proto

Service Definition
Methods
Message Types
Error Handling
Client Examples
OpenAI-Compatible Proxy

1. Service Definition

package brunix;

service AssistanceEngine {
  rpc AskAgent       (AgentRequest) returns (stream AgentResponse);
  rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
  rpc EvaluateRAG    (EvalRequest)  returns (EvalResponse);
}

Both AskAgent and AskAgentStream return a server-side stream of AgentResponse messages. They differ in how they produce and deliver the response — see §2.1 and §2.2.

2. Methods

2.1 `AskAgent`

Behaviour: Runs the full LangGraph pipeline (classify → reformulate → retrieve → generate) using llm.invoke(). Returns the complete answer as a single AgentResponse message with is_final = true.

Use case: Clients that do not support streaming or need a single atomic response.

Request: See AgentRequest in §3.

Response stream:

Message #	`text`	`avap_code`	`is_final`
1 (only)	Full answer text	`"AVAP-2026"`	`true`

Latency characteristics: Depends on LLM generation time (non-streaming). Typically 3–15 seconds for qwen2.5:1.5b on the Devaron cluster.

2.2 `AskAgentStream`

Behaviour: Runs prepare_graph (classify → reformulate → retrieve), then calls llm.stream() directly. Emits one AgentResponse per token from Ollama, followed by a terminal message.

Use case: Interactive clients (chat UIs, VS Code extension) that need progressive rendering.

Request: Same AgentRequest as AskAgent.

Response stream:

Message #	`text`	`avap_code`	`is_final`
1…N	Single token	`""`	`false`
N+1 (final)	`""`	`""`	`true`

Client contract:

Accumulate text from all messages where is_final == false to reconstruct the full answer.
The is_final == true message signals end-of-stream. Its text is always empty and should be discarded.
Do not close the stream early — the engine will fail to persist conversation history if the stream is interrupted.

2.3 `EvaluateRAG`

Behaviour: Runs the RAGAS evaluation pipeline against the golden dataset. Uses the production Ollama LLM for answer generation and Claude as the evaluation judge.

Requirement: ANTHROPIC_API_KEY must be configured in the environment. This endpoint will return an error response if it is missing.

Request:

message EvalRequest {
  string category = 1;  // Optional. Filter golden dataset by category name.
                         // If empty, all categories are evaluated.
  int32  limit    = 2;  // Optional. Evaluate only the first N questions.
                         // If 0, all matching questions are evaluated.
  string index    = 3;  // Optional. Elasticsearch index to evaluate against.
                         // If empty, uses the server's configured ELASTICSEARCH_INDEX.
}

Response (single, non-streaming):

message EvalResponse {
  string status               = 1;  // "ok" or error description
  int32  questions_evaluated  = 2;  // Number of questions actually processed
  float  elapsed_seconds      = 3;  // Total wall-clock time
  string judge_model          = 4;  // Claude model used as judge
  string index                = 5;  // Elasticsearch index evaluated

  // RAGAS metric scores (0.0 – 1.0)
  float  faithfulness         = 6;
  float  answer_relevancy     = 7;
  float  context_recall       = 8;
  float  context_precision    = 9;

  float  global_score         = 10; // Mean of non-zero metric scores
  string verdict              = 11; // "EXCELLENT" | "ACCEPTABLE" | "INSUFFICIENT"

  repeated QuestionDetail details = 12;
}

message QuestionDetail {
  string id             = 1;  // Question ID from golden dataset
  string category       = 2;  // Question category
  string question       = 3;  // Question text
  string answer_preview = 4;  // First 300 chars of generated answer
  int32  n_chunks       = 5;  // Number of context chunks retrieved
}

Verdict thresholds:

Score	Verdict
≥ 0.80	`EXCELLENT`
≥ 0.60	`ACCEPTABLE`
< 0.60	`INSUFFICIENT`

3. Message Types

`AgentRequest`

message AgentRequest {
  string query          = 1;
  string session_id     = 2;
  string editor_content = 3;
  string selected_text  = 4;
  string extra_context  = 5;
  string user_info      = 6;
}

Field	Type	Required	Encoding	Description
`query`	`string`	Yes	Plain text	User's natural language question. Max recommended: 4096 chars.
`session_id`	`string`	No	Plain text	Conversation identifier for multi-turn context. Use a stable UUID per user session. Defaults to `"default"` if empty.
`editor_content`	`string`	No	Base64	Full content of the active file open in the editor at query time. Decoded server-side before entering the graph.
`selected_text`	`string`	No	Base64	Text currently selected in the editor. Primary anchor for query reformulation and generation when the classifier detects an explicit editor reference.
`extra_context`	`string`	No	Base64	Free-form additional context (e.g. file path, language identifier, open diagnostic errors).
`user_info`	`string`	No	JSON string	Client identity metadata. Expected format: `{"dev_id": <int>, "project_id": <int>, "org_id": <int>}`. Available in graph state for future routing or personalisation — not yet consumed by the graph.

Editor context behaviour:

Fields 3–6 are all optional. If none are provided the assistant behaves exactly as without them — full backward compatibility. When editor_content or selected_text are provided, the graph classifier determines whether the user's question explicitly refers to that code. Only if the classifier returns EDITOR are the context fields injected into the generation prompt. This prevents the model from referencing editor code when the question is unrelated to it.

Base64 encoding:

editor_content, selected_text and extra_context must be Base64-encoded before sending. The server decodes them with UTF-8. Malformed Base64 is silently treated as empty string — no error is raised.

import base64
encoded = base64.b64encode(content.encode("utf-8")).decode("utf-8")

`AgentResponse`

Field	Type	Description
`text`	`string`	Token text (streaming) or full answer text (non-streaming)
`avap_code`	`string`	Currently always `"AVAP-2026"` in non-streaming mode, empty in streaming
`is_final`	`bool`	`true` only on the last message of the stream

`EvalRequest`

Field	Type	Required	Default	Description
`category`	`string`	No	`""` (all)	Filter golden dataset by category
`limit`	`int32`	No	`0` (all)	Max questions to evaluate
`index`	`string`	No	`$ELASTICSEARCH_INDEX`	ES index to evaluate

`EvalResponse`

See full definition in §2.3.

4. Error Handling

The engine catches all exceptions and returns them as terminal AgentResponse messages rather than gRPC status errors. This means:

The stream will not be terminated with a non-OK gRPC status code on application-level errors.
Check for error strings in the text field that begin with [ENG] Error:.
The stream will still end with is_final = true.

Example error response:

{"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true}

EvaluateRAG error response: Returned as a single EvalResponse with status set to the error description:

{"status": "ANTHROPIC_API_KEY no configurada en .env", ...}

5. Client Examples

Introspect the service

grpcurl -plaintext localhost:50052 list
# Output: brunix.AssistanceEngine

grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine

`AskAgent` — basic query

grpcurl -plaintext \
  -d '{"query": "Que significa AVAP?", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgent

Expected response:

{
  "text": "AVAP (Advanced Virtual API Programming) es un DSL Turing Completo...",
  "avap_code": "AVAP-2026",
  "is_final": true
}

`AskAgent` — with editor context

import base64, json, grpc
import brunix_pb2, brunix_pb2_grpc

def encode(text: str) -> str:
    return base64.b64encode(text.encode("utf-8")).decode("utf-8")

channel = grpc.insecure_channel("localhost:50052")
stub    = brunix_pb2_grpc.AssistanceEngineStub(channel)

editor_code = """
try()
    ormDirect("UPDATE users SET active=1", res)
exception(e)
    addVar(_status, 500)
    addResult("Error")
end()
"""

request = brunix_pb2.AgentRequest(
    query          = "why is this not catching the error?",
    session_id     = "dev-001",
    editor_content = encode(editor_code),
    selected_text  = encode(editor_code),   # same block selected
    extra_context  = encode("file: handler.avap"),
    user_info      = json.dumps({"dev_id": 1, "project_id": 2, "org_id": 3}),
)

for response in stub.AskAgent(request):
    if response.is_final:
        print(response.text)

`AskAgentStream` — token streaming

grpcurl -plaintext \
  -d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgentStream

Expected response (truncated):

{"text": "Here", "is_final": false}
{"text": " is", "is_final": false}
{"text": " a", "is_final": false}
...
{"text": "", "is_final": true}

`EvaluateRAG` — run evaluation

grpcurl -plaintext \
  -d '{"category": "core_syntax", "limit": 10}' \
  localhost:50052 \
  brunix.AssistanceEngine/EvaluateRAG

Expected response:

{
  "status": "ok",
  "questions_evaluated": 10,
  "elapsed_seconds": 142.3,
  "judge_model": "claude-sonnet-4-20250514",
  "index": "avap-knowledge-v1",
  "faithfulness": 0.8421,
  "answer_relevancy": 0.7913,
  "context_recall": 0.7234,
  "context_precision": 0.6891,
  "global_score": 0.7615,
  "verdict": "ACCEPTABLE",
  "details": [...]
}

Multi-turn conversation

# Turn 1
grpcurl -plaintext \
  -d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream

# Turn 2 — engine has history from Turn 1
grpcurl -plaintext \
  -d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream

Regenerate gRPC stubs after modifying `brunix.proto`

python -m grpc_tools.protoc \
  -I./Docker/protos \
  --python_out=./Docker/src \
  --grpc_python_out=./Docker/src \
  ./Docker/protos/brunix.proto

6. OpenAI-Compatible Proxy

The container also exposes an HTTP server on port 8000 (openai_proxy.py) that wraps the gRPC interface under an OpenAI-compatible API. This allows integration with any tool that supports the OpenAI Chat Completions API — continue.dev, LiteLLM, Open WebUI, or any custom client.

Base URL: http://localhost:8000

Available endpoints

Method	Endpoint	Description
`POST`	`/v1/chat/completions`	OpenAI Chat Completions. Routes to `AskAgent` or `AskAgentStream`.
`POST`	`/v1/completions`	OpenAI Completions (legacy).
`GET`	`/v1/models`	Lists available models. Returns `brunix`.
`POST`	`/api/chat`	Ollama chat format (NDJSON streaming).
`POST`	`/api/generate`	Ollama generate format (NDJSON streaming).
`GET`	`/api/tags`	Ollama model list.
`GET`	`/health`	Health check. Returns `{"status": "ok"}`.

`POST /v1/chat/completions`

Routing: stream: false → AskAgent (single response). stream: true → AskAgentStream (SSE token stream).

Request body:

{
  "model": "brunix",
  "messages": [
    {"role": "user", "content": "Que significa AVAP?"}
  ],
  "stream": false,
  "session_id": "uuid-per-conversation",
  "user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
}

The user field (editor context transport):

The standard OpenAI user field is used to transport editor context as a JSON string. This allows the VS Code extension to send context without requiring API changes. Non-Brunix clients can omit user or set it to a plain string — both are handled gracefully.

Key in `user` JSON	Encoding	Description
`editor_content`	Base64	Full content of the active editor file
`selected_text`	Base64	Currently selected text in the editor
`extra_context`	Base64	Free-form additional context
`user_info`	JSON object	`{"dev_id": int, "project_id": int, "org_id": int}`

Important: session_id must be sent as a top-level field — never inside the user JSON. The proxy reads session_id exclusively from the dedicated field.

Example — general query (no editor context):

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "Que significa AVAP?"}],
    "stream": false,
    "session_id": "test-001"
  }'

Example — query with editor context (VS Code extension):

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "que hace este codigo?"}],
    "stream": true,
    "session_id": "test-001",
    "user": "{\"editor_content\":\"<base64>\",\"selected_text\":\"<base64>\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
  }'

Example — empty editor context fields:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "como funciona addVar?"}],
    "stream": false,
    "session_id": "test-002",
    "user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{}}"
  }'

15 KiB Raw Blame History Unescape Escape

Brunix Assistance Engine — API Reference

Table of Contents

1. Service Definition

2. Methods

2.1 AskAgent

2.2 AskAgentStream

2.3 EvaluateRAG

3. Message Types

AgentRequest

AgentResponse

EvalRequest

EvalResponse