# Brunix Assistance Engine — API Reference > **Protocol:** gRPC (proto3) > **Port:** `50052` (host) → `50051` (container) > **Reflection:** Enabled — service introspection available via `grpcurl` > **Source of truth:** `Docker/protos/brunix.proto` --- ## Table of Contents 1. [Service Definition](#1-service-definition) 2. [Methods](#2-methods) - [AskAgent](#21-askagent) - [AskAgentStream](#22-askagentstream) - [EvaluateRAG](#23-evaluaterag) 3. [Message Types](#3-message-types) 4. [Error Handling](#4-error-handling) 5. [Client Examples](#5-client-examples) 6. [OpenAI-Compatible Proxy](#6-openai-compatible-proxy) --- ## 1. Service Definition ```protobuf package brunix; service AssistanceEngine { rpc AskAgent (AgentRequest) returns (stream AgentResponse); rpc AskAgentStream (AgentRequest) returns (stream AgentResponse); rpc EvaluateRAG (EvalRequest) returns (EvalResponse); } ``` Both `AskAgent` and `AskAgentStream` return a **server-side stream** of `AgentResponse` messages. They differ in how they produce and deliver the response — see [§2.1](#21-askagent) and [§2.2](#22-askagentstream). --- ## 2. Methods ### 2.1 `AskAgent` **Behaviour:** Runs the full LangGraph pipeline (classify → reformulate → retrieve → generate) using `llm.invoke()`. Returns the complete answer as a **single** `AgentResponse` message with `is_final = true`. **Use case:** Clients that do not support streaming or need a single atomic response. **Request:** See [`AgentRequest`](#agentrequest) in §3. **Response stream:** | Message # | `text` | `avap_code` | `is_final` | |---|---|---|---| | 1 (only) | Full answer text | `"AVAP-2026"` | `true` | **Latency characteristics:** Depends on LLM generation time (non-streaming). Typically 3–15 seconds for `qwen2.5:1.5b` on the Devaron cluster. --- ### 2.2 `AskAgentStream` **Behaviour:** Runs `prepare_graph` (classify → reformulate → retrieve), then calls `llm.stream()` directly. Emits one `AgentResponse` per token from Ollama, followed by a terminal message. **Use case:** Interactive clients (chat UIs, VS Code extension) that need progressive rendering. **Request:** Same `AgentRequest` as `AskAgent`. **Response stream:** | Message # | `text` | `avap_code` | `is_final` | |---|---|---|---| | 1…N | Single token | `""` | `false` | | N+1 (final) | `""` | `""` | `true` | **Client contract:** - Accumulate `text` from all messages where `is_final == false` to reconstruct the full answer. - The `is_final == true` message signals end-of-stream. Its `text` is always empty and should be discarded. - Do not close the stream early — the engine will fail to persist conversation history if the stream is interrupted. --- ### 2.3 `EvaluateRAG` **Behaviour:** Runs the RAGAS evaluation pipeline against the golden dataset. Uses the production Ollama LLM for answer generation and Claude as the evaluation judge. > **Requirement:** `ANTHROPIC_API_KEY` must be configured in the environment. This endpoint will return an error response if it is missing. **Request:** ```protobuf message EvalRequest { string category = 1; // Optional. Filter golden dataset by category name. // If empty, all categories are evaluated. int32 limit = 2; // Optional. Evaluate only the first N questions. // If 0, all matching questions are evaluated. string index = 3; // Optional. Elasticsearch index to evaluate against. // If empty, uses the server's configured ELASTICSEARCH_INDEX. } ``` **Response (single, non-streaming):** ```protobuf message EvalResponse { string status = 1; // "ok" or error description int32 questions_evaluated = 2; // Number of questions actually processed float elapsed_seconds = 3; // Total wall-clock time string judge_model = 4; // Claude model used as judge string index = 5; // Elasticsearch index evaluated // RAGAS metric scores (0.0 – 1.0) float faithfulness = 6; float answer_relevancy = 7; float context_recall = 8; float context_precision = 9; float global_score = 10; // Mean of non-zero metric scores string verdict = 11; // "EXCELLENT" | "ACCEPTABLE" | "INSUFFICIENT" repeated QuestionDetail details = 12; } message QuestionDetail { string id = 1; // Question ID from golden dataset string category = 2; // Question category string question = 3; // Question text string answer_preview = 4; // First 300 chars of generated answer int32 n_chunks = 5; // Number of context chunks retrieved } ``` **Verdict thresholds:** | Score | Verdict | |---|---| | ≥ 0.80 | `EXCELLENT` | | ≥ 0.60 | `ACCEPTABLE` | | < 0.60 | `INSUFFICIENT` | --- ## 3. Message Types ### `AgentRequest` ```protobuf message AgentRequest { string query = 1; string session_id = 2; string editor_content = 3; string selected_text = 4; string extra_context = 5; string user_info = 6; } ``` | Field | Type | Required | Encoding | Description | |---|---|---|---|---| | `query` | `string` | Yes | Plain text | User's natural language question. Max recommended: 4096 chars. | | `session_id` | `string` | No | Plain text | Conversation identifier for multi-turn context. Use a stable UUID per user session. Defaults to `"default"` if empty. | | `editor_content` | `string` | No | Base64 | Full content of the active file open in the editor at query time. Decoded server-side before entering the graph. | | `selected_text` | `string` | No | Base64 | Text currently selected in the editor. Primary anchor for query reformulation and generation when the classifier detects an explicit editor reference. | | `extra_context` | `string` | No | Base64 | Free-form additional context (e.g. file path, language identifier, open diagnostic errors). | | `user_info` | `string` | No | JSON string | Client identity metadata. Expected format: `{"dev_id": , "project_id": , "org_id": }`. Available in graph state for future routing or personalisation — not yet consumed by the graph. | **Editor context behaviour:** Fields 3–6 are all optional. If none are provided the assistant behaves exactly as without them — full backward compatibility. When `editor_content` or `selected_text` are provided, the graph classifier determines whether the user's question explicitly refers to that code. Only if the classifier returns `EDITOR` are the context fields injected into the generation prompt. This prevents the model from referencing editor code when the question is unrelated to it. **Base64 encoding:** `editor_content`, `selected_text` and `extra_context` must be Base64-encoded before sending. The server decodes them with UTF-8. Malformed Base64 is silently treated as empty string — no error is raised. ```python import base64 encoded = base64.b64encode(content.encode("utf-8")).decode("utf-8") ``` --- ### `AgentResponse` | Field | Type | Description | |---|---|---| | `text` | `string` | Token text (streaming) or full answer text (non-streaming) | | `avap_code` | `string` | Currently always `"AVAP-2026"` in non-streaming mode, empty in streaming | | `is_final` | `bool` | `true` only on the last message of the stream | --- ### `EvalRequest` | Field | Type | Required | Default | Description | |---|---|---|---|---| | `category` | `string` | No | `""` (all) | Filter golden dataset by category | | `limit` | `int32` | No | `0` (all) | Max questions to evaluate | | `index` | `string` | No | `$ELASTICSEARCH_INDEX` | ES index to evaluate | --- ### `EvalResponse` See full definition in [§2.3](#23-evaluaterag). --- ## 4. Error Handling The engine catches all exceptions and returns them as terminal `AgentResponse` messages rather than gRPC status errors. This means: - The stream will **not** be terminated with a non-OK gRPC status code on application-level errors. - Check for error strings in the `text` field that begin with `[ENG] Error:`. - The stream will still end with `is_final = true`. **Example error response:** ```json {"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true} ``` **`EvaluateRAG` error response:** Returned as a single `EvalResponse` with `status` set to the error description: ```json {"status": "ANTHROPIC_API_KEY no configurada en .env", ...} ``` --- ## 5. Client Examples ### Introspect the service ```bash grpcurl -plaintext localhost:50052 list # Output: brunix.AssistanceEngine grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine ``` ### `AskAgent` — basic query ```bash grpcurl -plaintext \ -d '{"query": "Que significa AVAP?", "session_id": "dev-001"}' \ localhost:50052 \ brunix.AssistanceEngine/AskAgent ``` Expected response: ```json { "text": "AVAP (Advanced Virtual API Programming) es un DSL Turing Completo...", "avap_code": "AVAP-2026", "is_final": true } ``` ### `AskAgent` — with editor context ```python import base64, json, grpc import brunix_pb2, brunix_pb2_grpc def encode(text: str) -> str: return base64.b64encode(text.encode("utf-8")).decode("utf-8") channel = grpc.insecure_channel("localhost:50052") stub = brunix_pb2_grpc.AssistanceEngineStub(channel) editor_code = """ try() ormDirect("UPDATE users SET active=1", res) exception(e) addVar(_status, 500) addResult("Error") end() """ request = brunix_pb2.AgentRequest( query = "why is this not catching the error?", session_id = "dev-001", editor_content = encode(editor_code), selected_text = encode(editor_code), # same block selected extra_context = encode("file: handler.avap"), user_info = json.dumps({"dev_id": 1, "project_id": 2, "org_id": 3}), ) for response in stub.AskAgent(request): if response.is_final: print(response.text) ``` ### `AskAgentStream` — token streaming ```bash grpcurl -plaintext \ -d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \ localhost:50052 \ brunix.AssistanceEngine/AskAgentStream ``` Expected response (truncated): ```json {"text": "Here", "is_final": false} {"text": " is", "is_final": false} {"text": " a", "is_final": false} ... {"text": "", "is_final": true} ``` ### `EvaluateRAG` — run evaluation ```bash grpcurl -plaintext \ -d '{"category": "core_syntax", "limit": 10}' \ localhost:50052 \ brunix.AssistanceEngine/EvaluateRAG ``` Expected response: ```json { "status": "ok", "questions_evaluated": 10, "elapsed_seconds": 142.3, "judge_model": "claude-sonnet-4-20250514", "index": "avap-knowledge-v1", "faithfulness": 0.8421, "answer_relevancy": 0.7913, "context_recall": 0.7234, "context_precision": 0.6891, "global_score": 0.7615, "verdict": "ACCEPTABLE", "details": [...] } ``` ### Multi-turn conversation ```bash # Turn 1 grpcurl -plaintext \ -d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \ localhost:50052 brunix.AssistanceEngine/AskAgentStream # Turn 2 — engine has history from Turn 1 grpcurl -plaintext \ -d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \ localhost:50052 brunix.AssistanceEngine/AskAgentStream ``` ### Regenerate gRPC stubs after modifying `brunix.proto` ```bash python -m grpc_tools.protoc \ -I./Docker/protos \ --python_out=./Docker/src \ --grpc_python_out=./Docker/src \ ./Docker/protos/brunix.proto ``` --- ## 6. OpenAI-Compatible Proxy The container also exposes an HTTP server on port `8000` (`openai_proxy.py`) that wraps the gRPC interface under an OpenAI-compatible API. This allows integration with any tool that supports the OpenAI Chat Completions API — `continue.dev`, LiteLLM, Open WebUI, or any custom client. **Base URL:** `http://localhost:8000` ### Available endpoints | Method | Endpoint | Description | |---|---|---| | `POST` | `/v1/chat/completions` | OpenAI Chat Completions. Routes to `AskAgent` or `AskAgentStream`. | | `POST` | `/v1/completions` | OpenAI Completions (legacy). | | `GET` | `/v1/models` | Lists available models. Returns `brunix`. | | `POST` | `/api/chat` | Ollama chat format (NDJSON streaming). | | `POST` | `/api/generate` | Ollama generate format (NDJSON streaming). | | `GET` | `/api/tags` | Ollama model list. | | `GET` | `/health` | Health check. Returns `{"status": "ok"}`. | ### `POST /v1/chat/completions` **Routing:** `stream: false` → `AskAgent` (single response). `stream: true` → `AskAgentStream` (SSE token stream). **Request body:** ```json { "model": "brunix", "messages": [ {"role": "user", "content": "Que significa AVAP?"} ], "stream": false, "session_id": "uuid-per-conversation", "user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}" } ``` **The `user` field (editor context transport):** The standard OpenAI `user` field is used to transport editor context as a JSON string. This allows the VS Code extension to send context without requiring API changes. Non-Brunix clients can omit `user` or set it to a plain string — both are handled gracefully. | Key in `user` JSON | Encoding | Description | |---|---|---| | `editor_content` | Base64 | Full content of the active editor file | | `selected_text` | Base64 | Currently selected text in the editor | | `extra_context` | Base64 | Free-form additional context | | `user_info` | JSON object | `{"dev_id": int, "project_id": int, "org_id": int}` | **Important:** `session_id` must be sent as a top-level field — never inside the `user` JSON. The proxy reads `session_id` exclusively from the dedicated field. **Example — general query (no editor context):** ```bash curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "brunix", "messages": [{"role": "user", "content": "Que significa AVAP?"}], "stream": false, "session_id": "test-001" }' ``` **Example — query with editor context (VS Code extension):** ```bash curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "brunix", "messages": [{"role": "user", "content": "que hace este codigo?"}], "stream": true, "session_id": "test-001", "user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}" }' ``` **Example — empty editor context fields:** ```bash curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "brunix", "messages": [{"role": "user", "content": "como funciona addVar?"}], "stream": false, "session_id": "test-002", "user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{}}" }' ```