15 KiB
Brunix Assistance Engine — API Reference
Protocol: gRPC (proto3) Port:
50052(host) →50051(container) Reflection: Enabled — service introspection available viagrpcurlSource of truth:Docker/protos/brunix.proto
Table of Contents
1. Service Definition
package brunix;
service AssistanceEngine {
rpc AskAgent (AgentRequest) returns (stream AgentResponse);
rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
rpc EvaluateRAG (EvalRequest) returns (EvalResponse);
}
Both AskAgent and AskAgentStream return a server-side stream of AgentResponse messages. They differ in how they produce and deliver the response — see §2.1 and §2.2.
2. Methods
2.1 AskAgent
Behaviour: Runs the full LangGraph pipeline (classify → reformulate → retrieve → generate) using llm.invoke(). Returns the complete answer as a single AgentResponse message with is_final = true.
Use case: Clients that do not support streaming or need a single atomic response.
Request: See AgentRequest in §3.
Response stream:
| Message # | text |
avap_code |
is_final |
|---|---|---|---|
| 1 (only) | Full answer text | "AVAP-2026" |
true |
Latency characteristics: Depends on LLM generation time (non-streaming). Typically 3–15 seconds for qwen2.5:1.5b on the Devaron cluster.
2.2 AskAgentStream
Behaviour: Runs prepare_graph (classify → reformulate → retrieve), then calls llm.stream() directly. Emits one AgentResponse per token from Ollama, followed by a terminal message.
Use case: Interactive clients (chat UIs, VS Code extension) that need progressive rendering.
Request: Same AgentRequest as AskAgent.
Response stream:
| Message # | text |
avap_code |
is_final |
|---|---|---|---|
| 1…N | Single token | "" |
false |
| N+1 (final) | "" |
"" |
true |
Client contract:
- Accumulate
textfrom all messages whereis_final == falseto reconstruct the full answer. - The
is_final == truemessage signals end-of-stream. Itstextis always empty and should be discarded. - Do not close the stream early — the engine will fail to persist conversation history if the stream is interrupted.
2.3 EvaluateRAG
Behaviour: Runs the RAGAS evaluation pipeline against the golden dataset. Uses the production Ollama LLM for answer generation and Claude as the evaluation judge.
Requirement:
ANTHROPIC_API_KEYmust be configured in the environment. This endpoint will return an error response if it is missing.
Request:
message EvalRequest {
string category = 1; // Optional. Filter golden dataset by category name.
// If empty, all categories are evaluated.
int32 limit = 2; // Optional. Evaluate only the first N questions.
// If 0, all matching questions are evaluated.
string index = 3; // Optional. Elasticsearch index to evaluate against.
// If empty, uses the server's configured ELASTICSEARCH_INDEX.
}
Response (single, non-streaming):
message EvalResponse {
string status = 1; // "ok" or error description
int32 questions_evaluated = 2; // Number of questions actually processed
float elapsed_seconds = 3; // Total wall-clock time
string judge_model = 4; // Claude model used as judge
string index = 5; // Elasticsearch index evaluated
// RAGAS metric scores (0.0 – 1.0)
float faithfulness = 6;
float answer_relevancy = 7;
float context_recall = 8;
float context_precision = 9;
float global_score = 10; // Mean of non-zero metric scores
string verdict = 11; // "EXCELLENT" | "ACCEPTABLE" | "INSUFFICIENT"
repeated QuestionDetail details = 12;
}
message QuestionDetail {
string id = 1; // Question ID from golden dataset
string category = 2; // Question category
string question = 3; // Question text
string answer_preview = 4; // First 300 chars of generated answer
int32 n_chunks = 5; // Number of context chunks retrieved
}
Verdict thresholds:
| Score | Verdict |
|---|---|
| ≥ 0.80 | EXCELLENT |
| ≥ 0.60 | ACCEPTABLE |
| < 0.60 | INSUFFICIENT |
3. Message Types
AgentRequest
message AgentRequest {
string query = 1;
string session_id = 2;
string editor_content = 3;
string selected_text = 4;
string extra_context = 5;
string user_info = 6;
}
| Field | Type | Required | Encoding | Description |
|---|---|---|---|---|
query |
string |
Yes | Plain text | User's natural language question. Max recommended: 4096 chars. |
session_id |
string |
No | Plain text | Conversation identifier for multi-turn context. Use a stable UUID per user session. Defaults to "default" if empty. |
editor_content |
string |
No | Base64 | Full content of the active file open in the editor at query time. Decoded server-side before entering the graph. |
selected_text |
string |
No | Base64 | Text currently selected in the editor. Primary anchor for query reformulation and generation when the classifier detects an explicit editor reference. |
extra_context |
string |
No | Base64 | Free-form additional context (e.g. file path, language identifier, open diagnostic errors). |
user_info |
string |
No | JSON string | Client identity metadata. Expected format: {"dev_id": <int>, "project_id": <int>, "org_id": <int>}. Available in graph state for future routing or personalisation — not yet consumed by the graph. |
Editor context behaviour:
Fields 3–6 are all optional. If none are provided the assistant behaves exactly as without them — full backward compatibility. When editor_content or selected_text are provided, the graph classifier determines whether the user's question explicitly refers to that code. Only if the classifier returns EDITOR are the context fields injected into the generation prompt. This prevents the model from referencing editor code when the question is unrelated to it.
Base64 encoding:
editor_content, selected_text and extra_context must be Base64-encoded before sending. The server decodes them with UTF-8. Malformed Base64 is silently treated as empty string — no error is raised.
import base64
encoded = base64.b64encode(content.encode("utf-8")).decode("utf-8")
AgentResponse
| Field | Type | Description |
|---|---|---|
text |
string |
Token text (streaming) or full answer text (non-streaming) |
avap_code |
string |
Currently always "AVAP-2026" in non-streaming mode, empty in streaming |
is_final |
bool |
true only on the last message of the stream |
EvalRequest
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
category |
string |
No | "" (all) |
Filter golden dataset by category |
limit |
int32 |
No | 0 (all) |
Max questions to evaluate |
index |
string |
No | $ELASTICSEARCH_INDEX |
ES index to evaluate |
EvalResponse
See full definition in §2.3.
4. Error Handling
The engine catches all exceptions and returns them as terminal AgentResponse messages rather than gRPC status errors. This means:
- The stream will not be terminated with a non-OK gRPC status code on application-level errors.
- Check for error strings in the
textfield that begin with[ENG] Error:. - The stream will still end with
is_final = true.
Example error response:
{"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true}
EvaluateRAG error response:
Returned as a single EvalResponse with status set to the error description:
{"status": "ANTHROPIC_API_KEY no configurada en .env", ...}
5. Client Examples
Introspect the service
grpcurl -plaintext localhost:50052 list
# Output: brunix.AssistanceEngine
grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
AskAgent — basic query
grpcurl -plaintext \
-d '{"query": "Que significa AVAP?", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgent
Expected response:
{
"text": "AVAP (Advanced Virtual API Programming) es un DSL Turing Completo...",
"avap_code": "AVAP-2026",
"is_final": true
}
AskAgent — with editor context
import base64, json, grpc
import brunix_pb2, brunix_pb2_grpc
def encode(text: str) -> str:
return base64.b64encode(text.encode("utf-8")).decode("utf-8")
channel = grpc.insecure_channel("localhost:50052")
stub = brunix_pb2_grpc.AssistanceEngineStub(channel)
editor_code = """
try()
ormDirect("UPDATE users SET active=1", res)
exception(e)
addVar(_status, 500)
addResult("Error")
end()
"""
request = brunix_pb2.AgentRequest(
query = "why is this not catching the error?",
session_id = "dev-001",
editor_content = encode(editor_code),
selected_text = encode(editor_code), # same block selected
extra_context = encode("file: handler.avap"),
user_info = json.dumps({"dev_id": 1, "project_id": 2, "org_id": 3}),
)
for response in stub.AskAgent(request):
if response.is_final:
print(response.text)
AskAgentStream — token streaming
grpcurl -plaintext \
-d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgentStream
Expected response (truncated):
{"text": "Here", "is_final": false}
{"text": " is", "is_final": false}
{"text": " a", "is_final": false}
...
{"text": "", "is_final": true}
EvaluateRAG — run evaluation
grpcurl -plaintext \
-d '{"category": "core_syntax", "limit": 10}' \
localhost:50052 \
brunix.AssistanceEngine/EvaluateRAG
Expected response:
{
"status": "ok",
"questions_evaluated": 10,
"elapsed_seconds": 142.3,
"judge_model": "claude-sonnet-4-20250514",
"index": "avap-knowledge-v1",
"faithfulness": 0.8421,
"answer_relevancy": 0.7913,
"context_recall": 0.7234,
"context_precision": 0.6891,
"global_score": 0.7615,
"verdict": "ACCEPTABLE",
"details": [...]
}
Multi-turn conversation
# Turn 1
grpcurl -plaintext \
-d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
# Turn 2 — engine has history from Turn 1
grpcurl -plaintext \
-d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
Regenerate gRPC stubs after modifying brunix.proto
python -m grpc_tools.protoc \
-I./Docker/protos \
--python_out=./Docker/src \
--grpc_python_out=./Docker/src \
./Docker/protos/brunix.proto
6. OpenAI-Compatible Proxy
The container also exposes an HTTP server on port 8000 (openai_proxy.py) that wraps the gRPC interface under an OpenAI-compatible API. This allows integration with any tool that supports the OpenAI Chat Completions API — continue.dev, LiteLLM, Open WebUI, or any custom client.
Base URL: http://localhost:8000
Available endpoints
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/chat/completions |
OpenAI Chat Completions. Routes to AskAgent or AskAgentStream. |
POST |
/v1/completions |
OpenAI Completions (legacy). |
GET |
/v1/models |
Lists available models. Returns brunix. |
POST |
/api/chat |
Ollama chat format (NDJSON streaming). |
POST |
/api/generate |
Ollama generate format (NDJSON streaming). |
GET |
/api/tags |
Ollama model list. |
GET |
/health |
Health check. Returns {"status": "ok"}. |
POST /v1/chat/completions
Routing: stream: false → AskAgent (single response). stream: true → AskAgentStream (SSE token stream).
Request body:
{
"model": "brunix",
"messages": [
{"role": "user", "content": "Que significa AVAP?"}
],
"stream": false,
"session_id": "uuid-per-conversation",
"user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
}
The user field (editor context transport):
The standard OpenAI user field is used to transport editor context as a JSON string. This allows the VS Code extension to send context without requiring API changes. Non-Brunix clients can omit user or set it to a plain string — both are handled gracefully.
Key in user JSON |
Encoding | Description |
|---|---|---|
editor_content |
Base64 | Full content of the active editor file |
selected_text |
Base64 | Currently selected text in the editor |
extra_context |
Base64 | Free-form additional context |
user_info |
JSON object | {"dev_id": int, "project_id": int, "org_id": int} |
Important: session_id must be sent as a top-level field — never inside the user JSON. The proxy reads session_id exclusively from the dedicated field.
Example — general query (no editor context):
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "Que significa AVAP?"}],
"stream": false,
"session_id": "test-001"
}'
Example — query with editor context (VS Code extension):
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "que hace este codigo?"}],
"stream": true,
"session_id": "test-001",
"user": "{\"editor_content\":\"<base64>\",\"selected_text\":\"<base64>\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
}'
Example — empty editor context fields:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "como funciona addVar?"}],
"stream": false,
"session_id": "test-002",
"user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{}}"
}'