9.9 KiB

Raw Blame History

Brunix Assistance Engine — API Reference

Protocol: gRPC (proto3) Port: 50052 (host) → 50051 (container) Reflection: Enabled — service introspection available via grpcurl Source of truth: Docker/protos/brunix.proto

Service Definition
Methods
Message Types
Error Handling
Client Examples
OpenAI-Compatible Proxy

1. Service Definition

package brunix;

service AssistanceEngine {
  rpc AskAgent       (AgentRequest) returns (stream AgentResponse);
  rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
  rpc EvaluateRAG    (EvalRequest)  returns (EvalResponse);
}

Both AskAgent and AskAgentStream return a server-side stream of AgentResponse messages. They differ in how they produce and deliver the response — see §2.1 and §2.2.

2. Methods

2.1 `AskAgent`

Behaviour: Runs the full LangGraph pipeline (classify → reformulate → retrieve → generate) using llm.invoke(). Returns the complete answer as a single AgentResponse message with is_final = true.

Use case: Clients that do not support streaming or need a single atomic response.

Request:

message AgentRequest {
  string query      = 1;  // The user's question. Required. Max recommended: 4096 chars.
  string session_id = 2;  // Conversation session identifier. Optional.
                           // If empty, defaults to "default" (shared session).
                           // Use a UUID per user/conversation for isolation.
}

Response stream:

Message #	`text`	`avap_code`	`is_final`
1 (only)	Full answer text	`"AVAP-2026"`	`true`

Latency characteristics: Depends on LLM generation time (non-streaming). Typically 3–15 seconds for qwen2.5:1.5b on the Devaron cluster.

2.2 `AskAgentStream`

Behaviour: Runs prepare_graph (classify → reformulate → retrieve), then calls llm.stream() directly. Emits one AgentResponse per token from Ollama, followed by a terminal message.

Use case: Interactive clients (chat UIs, terminal tools) that need progressive rendering.

Request: Same AgentRequest as AskAgent.

Response stream:

Message #	`text`	`avap_code`	`is_final`
1…N	Single token	`""`	`false`
N+1 (final)	`""`	`""`	`true`

Client contract:

Accumulate text from all messages where is_final == false to reconstruct the full answer.
The is_final == true message signals end-of-stream. Its text is always empty and should be discarded.
Do not close the stream early — the engine will fail to persist conversation history if the stream is interrupted.

2.3 `EvaluateRAG`

Behaviour: Runs the RAGAS evaluation pipeline against the golden dataset. Uses the production Ollama LLM for answer generation and Claude as the evaluation judge.

Requirement: ANTHROPIC_API_KEY must be configured in the environment. This endpoint will return an error response if it is missing.

Request:

message EvalRequest {
  string category = 1;  // Optional. Filter golden dataset by category name.
                         // If empty, all categories are evaluated.
  int32  limit    = 2;  // Optional. Evaluate only the first N questions.
                         // If 0, all matching questions are evaluated.
  string index    = 3;  // Optional. Elasticsearch index to evaluate against.
                         // If empty, uses the server's configured ELASTICSEARCH_INDEX.
}

Response (single, non-streaming):

message EvalResponse {
  string status               = 1;  // "ok" or error description
  int32  questions_evaluated  = 2;  // Number of questions actually processed
  float  elapsed_seconds      = 3;  // Total wall-clock time
  string judge_model          = 4;  // Claude model used as judge
  string index                = 5;  // Elasticsearch index evaluated

  // RAGAS metric scores (0.0 – 1.0)
  float  faithfulness         = 6;
  float  answer_relevancy     = 7;
  float  context_recall       = 8;
  float  context_precision    = 9;

  float  global_score         = 10; // Mean of non-zero metric scores
  string verdict              = 11; // "EXCELLENT" | "ACCEPTABLE" | "INSUFFICIENT"

  repeated QuestionDetail details = 12;
}

message QuestionDetail {
  string id             = 1;  // Question ID from golden dataset
  string category       = 2;  // Question category
  string question       = 3;  // Question text
  string answer_preview = 4;  // First 300 chars of generated answer
  int32  n_chunks       = 5;  // Number of context chunks retrieved
}

Verdict thresholds:

Score	Verdict
≥ 0.80	`EXCELLENT`
≥ 0.60	`ACCEPTABLE`
< 0.60	`INSUFFICIENT`

3. Message Types

`AgentRequest`

Field	Type	Required	Description
`query`	`string`	Yes	User's natural language question
`session_id`	`string`	No	Conversation identifier for multi-turn context. Use a stable UUID per user session.

`AgentResponse`

Field	Type	Description
`text`	`string`	Token text (streaming) or full answer text (non-streaming)
`avap_code`	`string`	Currently always `"AVAP-2026"` in non-streaming mode, empty in streaming
`is_final`	`bool`	`true` only on the last message of the stream

`EvalRequest`

Field	Type	Required	Default	Description
`category`	`string`	No	`""` (all)	Filter golden dataset by category
`limit`	`int32`	No	`0` (all)	Max questions to evaluate
`index`	`string`	No	`$ELASTICSEARCH_INDEX`	ES index to evaluate

`EvalResponse`

See full definition in §2.3.

4. Error Handling

The engine catches all exceptions and returns them as terminal AgentResponse messages rather than gRPC status errors. This means:

The stream will not be terminated with a non-OK gRPC status code on application-level errors.
Check for error strings in the text field that begin with [ENG] Error:.
The stream will still end with is_final = true.

Example error response:

{"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true}

EvaluateRAG error response:
Returned as a single EvalResponse with status set to the error description:

{"status": "ANTHROPIC_API_KEY no configurada en .env", ...}

5. Client Examples

Introspect the service

grpcurl -plaintext localhost:50052 list
# Output: brunix.AssistanceEngine

grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine

`AskAgent` — full response

grpcurl -plaintext \
  -d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgent

Expected response:

{
  "text": "addVar is an AVAP command that declares a new variable...",
  "avap_code": "AVAP-2026",
  "is_final": true
}

`AskAgentStream` — token streaming

grpcurl -plaintext \
  -d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgentStream

Expected response (truncated):

{"text": "Here", "is_final": false}
{"text": " is", "is_final": false}
{"text": " a", "is_final": false}
...
{"text": "", "is_final": true}

`EvaluateRAG` — run evaluation

# Evaluate first 10 questions from the "core_syntax" category
grpcurl -plaintext \
  -d '{"category": "core_syntax", "limit": 10}' \
  localhost:50052 \
  brunix.AssistanceEngine/EvaluateRAG

Expected response:

{
  "status": "ok",
  "questions_evaluated": 10,
  "elapsed_seconds": 142.3,
  "judge_model": "claude-sonnet-4-20250514",
  "index": "avap-docs-test",
  "faithfulness": 0.8421,
  "answer_relevancy": 0.7913,
  "context_recall": 0.7234,
  "context_precision": 0.6891,
  "global_score": 0.7615,
  "verdict": "ACCEPTABLE",
  "details": [...]
}

Multi-turn conversation example

# Turn 1
grpcurl -plaintext \
  -d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream

# Turn 2 — the engine has history from Turn 1
grpcurl -plaintext \
  -d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream

Regenerate gRPC stubs after modifying `brunix.proto`

python -m grpc_tools.protoc \
  -I./Docker/protos \
  --python_out=./Docker/src \
  --grpc_python_out=./Docker/src \
  ./Docker/protos/brunix.proto

6. OpenAI-Compatible Proxy

The container also exposes an HTTP server on port 8000 (openai_proxy.py) that wraps AskAgentStream under an OpenAI-compatible endpoint. This allows integration with any tool that supports the OpenAI Chat Completions API.

Base URL: http://localhost:8000

`POST /v1/chat/completions`

Request body:

{
  "model": "brunix",
  "messages": [
    {"role": "user", "content": "What is addVar in AVAP?"}
  ],
  "stream": true
}

Notes:

The model field is ignored; the engine always uses the configured OLLAMA_MODEL_NAME.
Session management is handled internally by the proxy. Conversation continuity across separate HTTP requests is not guaranteed.
Only stream: true is fully supported. Non-streaming mode may be available but is not the primary use case.

Example with curl:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "Explain AVAP loops"}],
    "stream": true
  }'

9.9 KiB Raw Blame History Unescape Escape

Brunix Assistance Engine — API Reference

Table of Contents

1. Service Definition

2. Methods

2.1 AskAgent

2.2 AskAgentStream

2.3 EvaluateRAG

3. Message Types

AgentRequest

AgentResponse

EvalRequest

EvalResponse