9.9 KiB
Brunix Assistance Engine — API Reference
Protocol: gRPC (proto3) Port:
50052(host) →50051(container) Reflection: Enabled — service introspection available viagrpcurlSource of truth:Docker/protos/brunix.proto
Table of Contents
1. Service Definition
package brunix;
service AssistanceEngine {
rpc AskAgent (AgentRequest) returns (stream AgentResponse);
rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
rpc EvaluateRAG (EvalRequest) returns (EvalResponse);
}
Both AskAgent and AskAgentStream return a server-side stream of AgentResponse messages. They differ in how they produce and deliver the response — see §2.1 and §2.2.
2. Methods
2.1 AskAgent
Behaviour: Runs the full LangGraph pipeline (classify → reformulate → retrieve → generate) using llm.invoke(). Returns the complete answer as a single AgentResponse message with is_final = true.
Use case: Clients that do not support streaming or need a single atomic response.
Request:
message AgentRequest {
string query = 1; // The user's question. Required. Max recommended: 4096 chars.
string session_id = 2; // Conversation session identifier. Optional.
// If empty, defaults to "default" (shared session).
// Use a UUID per user/conversation for isolation.
}
Response stream:
| Message # | text |
avap_code |
is_final |
|---|---|---|---|
| 1 (only) | Full answer text | "AVAP-2026" |
true |
Latency characteristics: Depends on LLM generation time (non-streaming). Typically 3–15 seconds for qwen2.5:1.5b on the Devaron cluster.
2.2 AskAgentStream
Behaviour: Runs prepare_graph (classify → reformulate → retrieve), then calls llm.stream() directly. Emits one AgentResponse per token from Ollama, followed by a terminal message.
Use case: Interactive clients (chat UIs, terminal tools) that need progressive rendering.
Request: Same AgentRequest as AskAgent.
Response stream:
| Message # | text |
avap_code |
is_final |
|---|---|---|---|
| 1…N | Single token | "" |
false |
| N+1 (final) | "" |
"" |
true |
Client contract:
- Accumulate
textfrom all messages whereis_final == falseto reconstruct the full answer. - The
is_final == truemessage signals end-of-stream. Itstextis always empty and should be discarded. - Do not close the stream early — the engine will fail to persist conversation history if the stream is interrupted.
2.3 EvaluateRAG
Behaviour: Runs the RAGAS evaluation pipeline against the golden dataset. Uses the production Ollama LLM for answer generation and Claude as the evaluation judge.
Requirement:
ANTHROPIC_API_KEYmust be configured in the environment. This endpoint will return an error response if it is missing.
Request:
message EvalRequest {
string category = 1; // Optional. Filter golden dataset by category name.
// If empty, all categories are evaluated.
int32 limit = 2; // Optional. Evaluate only the first N questions.
// If 0, all matching questions are evaluated.
string index = 3; // Optional. Elasticsearch index to evaluate against.
// If empty, uses the server's configured ELASTICSEARCH_INDEX.
}
Response (single, non-streaming):
message EvalResponse {
string status = 1; // "ok" or error description
int32 questions_evaluated = 2; // Number of questions actually processed
float elapsed_seconds = 3; // Total wall-clock time
string judge_model = 4; // Claude model used as judge
string index = 5; // Elasticsearch index evaluated
// RAGAS metric scores (0.0 – 1.0)
float faithfulness = 6;
float answer_relevancy = 7;
float context_recall = 8;
float context_precision = 9;
float global_score = 10; // Mean of non-zero metric scores
string verdict = 11; // "EXCELLENT" | "ACCEPTABLE" | "INSUFFICIENT"
repeated QuestionDetail details = 12;
}
message QuestionDetail {
string id = 1; // Question ID from golden dataset
string category = 2; // Question category
string question = 3; // Question text
string answer_preview = 4; // First 300 chars of generated answer
int32 n_chunks = 5; // Number of context chunks retrieved
}
Verdict thresholds:
| Score | Verdict |
|---|---|
| ≥ 0.80 | EXCELLENT |
| ≥ 0.60 | ACCEPTABLE |
| < 0.60 | INSUFFICIENT |
3. Message Types
AgentRequest
| Field | Type | Required | Description |
|---|---|---|---|
query |
string |
Yes | User's natural language question |
session_id |
string |
No | Conversation identifier for multi-turn context. Use a stable UUID per user session. |
AgentResponse
| Field | Type | Description |
|---|---|---|
text |
string |
Token text (streaming) or full answer text (non-streaming) |
avap_code |
string |
Currently always "AVAP-2026" in non-streaming mode, empty in streaming |
is_final |
bool |
true only on the last message of the stream |
EvalRequest
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
category |
string |
No | "" (all) |
Filter golden dataset by category |
limit |
int32 |
No | 0 (all) |
Max questions to evaluate |
index |
string |
No | $ELASTICSEARCH_INDEX |
ES index to evaluate |
EvalResponse
See full definition in §2.3.
4. Error Handling
The engine catches all exceptions and returns them as terminal AgentResponse messages rather than gRPC status errors. This means:
- The stream will not be terminated with a non-OK gRPC status code on application-level errors.
- Check for error strings in the
textfield that begin with[ENG] Error:. - The stream will still end with
is_final = true.
Example error response:
{"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true}
EvaluateRAG error response:
Returned as a single EvalResponse with status set to the error description:
{"status": "ANTHROPIC_API_KEY no configurada en .env", ...}
5. Client Examples
Introspect the service
grpcurl -plaintext localhost:50052 list
# Output: brunix.AssistanceEngine
grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
AskAgent — full response
grpcurl -plaintext \
-d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgent
Expected response:
{
"text": "addVar is an AVAP command that declares a new variable...",
"avap_code": "AVAP-2026",
"is_final": true
}
AskAgentStream — token streaming
grpcurl -plaintext \
-d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgentStream
Expected response (truncated):
{"text": "Here", "is_final": false}
{"text": " is", "is_final": false}
{"text": " a", "is_final": false}
...
{"text": "", "is_final": true}
EvaluateRAG — run evaluation
# Evaluate first 10 questions from the "core_syntax" category
grpcurl -plaintext \
-d '{"category": "core_syntax", "limit": 10}' \
localhost:50052 \
brunix.AssistanceEngine/EvaluateRAG
Expected response:
{
"status": "ok",
"questions_evaluated": 10,
"elapsed_seconds": 142.3,
"judge_model": "claude-sonnet-4-20250514",
"index": "avap-docs-test",
"faithfulness": 0.8421,
"answer_relevancy": 0.7913,
"context_recall": 0.7234,
"context_precision": 0.6891,
"global_score": 0.7615,
"verdict": "ACCEPTABLE",
"details": [...]
}
Multi-turn conversation example
# Turn 1
grpcurl -plaintext \
-d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
# Turn 2 — the engine has history from Turn 1
grpcurl -plaintext \
-d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
Regenerate gRPC stubs after modifying brunix.proto
python -m grpc_tools.protoc \
-I./Docker/protos \
--python_out=./Docker/src \
--grpc_python_out=./Docker/src \
./Docker/protos/brunix.proto
6. OpenAI-Compatible Proxy
The container also exposes an HTTP server on port 8000 (openai_proxy.py) that wraps AskAgentStream under an OpenAI-compatible endpoint. This allows integration with any tool that supports the OpenAI Chat Completions API.
Base URL: http://localhost:8000
POST /v1/chat/completions
Request body:
{
"model": "brunix",
"messages": [
{"role": "user", "content": "What is addVar in AVAP?"}
],
"stream": true
}
Notes:
- The
modelfield is ignored; the engine always uses the configuredOLLAMA_MODEL_NAME. - Session management is handled internally by the proxy. Conversation continuity across separate HTTP requests is not guaranteed.
- Only
stream: trueis fully supported. Non-streaming mode may be available but is not the primary use case.
Example with curl:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "Explain AVAP loops"}],
"stream": true
}'