452 lines
15 KiB
Markdown
452 lines
15 KiB
Markdown
# Brunix Assistance Engine — API Reference
|
||
|
||
> **Protocol:** gRPC (proto3)
|
||
> **Port:** `50052` (host) → `50051` (container)
|
||
> **Reflection:** Enabled — service introspection available via `grpcurl`
|
||
> **Source of truth:** `Docker/protos/brunix.proto`
|
||
|
||
---
|
||
|
||
## Table of Contents
|
||
|
||
1. [Service Definition](#1-service-definition)
|
||
2. [Methods](#2-methods)
|
||
- [AskAgent](#21-askagent)
|
||
- [AskAgentStream](#22-askagentstream)
|
||
- [EvaluateRAG](#23-evaluaterag)
|
||
3. [Message Types](#3-message-types)
|
||
4. [Error Handling](#4-error-handling)
|
||
5. [Client Examples](#5-client-examples)
|
||
6. [OpenAI-Compatible Proxy](#6-openai-compatible-proxy)
|
||
|
||
---
|
||
|
||
## 1. Service Definition
|
||
|
||
```protobuf
|
||
package brunix;
|
||
|
||
service AssistanceEngine {
|
||
rpc AskAgent (AgentRequest) returns (stream AgentResponse);
|
||
rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
|
||
rpc EvaluateRAG (EvalRequest) returns (EvalResponse);
|
||
}
|
||
```
|
||
|
||
Both `AskAgent` and `AskAgentStream` return a **server-side stream** of `AgentResponse` messages. They differ in how they produce and deliver the response — see [§2.1](#21-askagent) and [§2.2](#22-askagentstream).
|
||
|
||
---
|
||
|
||
## 2. Methods
|
||
|
||
### 2.1 `AskAgent`
|
||
|
||
**Behaviour:** Runs the full LangGraph pipeline (classify → reformulate → retrieve → generate) using `llm.invoke()`. Returns the complete answer as a **single** `AgentResponse` message with `is_final = true`.
|
||
|
||
**Use case:** Clients that do not support streaming or need a single atomic response.
|
||
|
||
**Request:** See [`AgentRequest`](#agentrequest) in §3.
|
||
|
||
**Response stream:**
|
||
|
||
| Message # | `text` | `avap_code` | `is_final` |
|
||
|---|---|---|---|
|
||
| 1 (only) | Full answer text | `"AVAP-2026"` | `true` |
|
||
|
||
**Latency characteristics:** Depends on LLM generation time (non-streaming). Typically 3–15 seconds for `qwen2.5:1.5b` on the Devaron cluster.
|
||
|
||
---
|
||
|
||
### 2.2 `AskAgentStream`
|
||
|
||
**Behaviour:** Runs `prepare_graph` (classify → reformulate → retrieve), then calls `llm.stream()` directly. Emits one `AgentResponse` per token from Ollama, followed by a terminal message.
|
||
|
||
**Use case:** Interactive clients (chat UIs, VS Code extension) that need progressive rendering.
|
||
|
||
**Request:** Same `AgentRequest` as `AskAgent`.
|
||
|
||
**Response stream:**
|
||
|
||
| Message # | `text` | `avap_code` | `is_final` |
|
||
|---|---|---|---|
|
||
| 1…N | Single token | `""` | `false` |
|
||
| N+1 (final) | `""` | `""` | `true` |
|
||
|
||
**Client contract:**
|
||
- Accumulate `text` from all messages where `is_final == false` to reconstruct the full answer.
|
||
- The `is_final == true` message signals end-of-stream. Its `text` is always empty and should be discarded.
|
||
- Do not close the stream early — the engine will fail to persist conversation history if the stream is interrupted.
|
||
|
||
---
|
||
|
||
### 2.3 `EvaluateRAG`
|
||
|
||
**Behaviour:** Runs the RAGAS evaluation pipeline against the golden dataset. Uses the production Ollama LLM for answer generation and Claude as the evaluation judge.
|
||
|
||
> **Requirement:** `ANTHROPIC_API_KEY` must be configured in the environment. This endpoint will return an error response if it is missing.
|
||
|
||
**Request:**
|
||
|
||
```protobuf
|
||
message EvalRequest {
|
||
string category = 1; // Optional. Filter golden dataset by category name.
|
||
// If empty, all categories are evaluated.
|
||
int32 limit = 2; // Optional. Evaluate only the first N questions.
|
||
// If 0, all matching questions are evaluated.
|
||
string index = 3; // Optional. Elasticsearch index to evaluate against.
|
||
// If empty, uses the server's configured ELASTICSEARCH_INDEX.
|
||
}
|
||
```
|
||
|
||
**Response (single, non-streaming):**
|
||
|
||
```protobuf
|
||
message EvalResponse {
|
||
string status = 1; // "ok" or error description
|
||
int32 questions_evaluated = 2; // Number of questions actually processed
|
||
float elapsed_seconds = 3; // Total wall-clock time
|
||
string judge_model = 4; // Claude model used as judge
|
||
string index = 5; // Elasticsearch index evaluated
|
||
|
||
// RAGAS metric scores (0.0 – 1.0)
|
||
float faithfulness = 6;
|
||
float answer_relevancy = 7;
|
||
float context_recall = 8;
|
||
float context_precision = 9;
|
||
|
||
float global_score = 10; // Mean of non-zero metric scores
|
||
string verdict = 11; // "EXCELLENT" | "ACCEPTABLE" | "INSUFFICIENT"
|
||
|
||
repeated QuestionDetail details = 12;
|
||
}
|
||
|
||
message QuestionDetail {
|
||
string id = 1; // Question ID from golden dataset
|
||
string category = 2; // Question category
|
||
string question = 3; // Question text
|
||
string answer_preview = 4; // First 300 chars of generated answer
|
||
int32 n_chunks = 5; // Number of context chunks retrieved
|
||
}
|
||
```
|
||
|
||
**Verdict thresholds:**
|
||
|
||
| Score | Verdict |
|
||
|---|---|
|
||
| ≥ 0.80 | `EXCELLENT` |
|
||
| ≥ 0.60 | `ACCEPTABLE` |
|
||
| < 0.60 | `INSUFFICIENT` |
|
||
|
||
---
|
||
|
||
## 3. Message Types
|
||
|
||
### `AgentRequest`
|
||
|
||
```protobuf
|
||
message AgentRequest {
|
||
string query = 1;
|
||
string session_id = 2;
|
||
string editor_content = 3;
|
||
string selected_text = 4;
|
||
string extra_context = 5;
|
||
string user_info = 6;
|
||
}
|
||
```
|
||
|
||
| Field | Type | Required | Encoding | Description |
|
||
|---|---|---|---|---|
|
||
| `query` | `string` | Yes | Plain text | User's natural language question. Max recommended: 4096 chars. |
|
||
| `session_id` | `string` | No | Plain text | Conversation identifier for multi-turn context. Use a stable UUID per user session. Defaults to `"default"` if empty. |
|
||
| `editor_content` | `string` | No | Base64 | Full content of the active file open in the editor at query time. Decoded server-side before entering the graph. |
|
||
| `selected_text` | `string` | No | Base64 | Text currently selected in the editor. Primary anchor for query reformulation and generation when the classifier detects an explicit editor reference. |
|
||
| `extra_context` | `string` | No | Base64 | Free-form additional context (e.g. file path, language identifier, open diagnostic errors). |
|
||
| `user_info` | `string` | No | JSON string | Client identity metadata. Expected format: `{"dev_id": <int>, "project_id": <int>, "org_id": <int>}`. Available in graph state for future routing or personalisation — not yet consumed by the graph. |
|
||
|
||
**Editor context behaviour:**
|
||
|
||
Fields 3–6 are all optional. If none are provided the assistant behaves exactly as without them — full backward compatibility. When `editor_content` or `selected_text` are provided, the graph classifier determines whether the user's question explicitly refers to that code. Only if the classifier returns `EDITOR` are the context fields injected into the generation prompt. This prevents the model from referencing editor code when the question is unrelated to it.
|
||
|
||
**Base64 encoding:**
|
||
|
||
`editor_content`, `selected_text` and `extra_context` must be Base64-encoded before sending. The server decodes them with UTF-8. Malformed Base64 is silently treated as empty string — no error is raised.
|
||
|
||
```python
|
||
import base64
|
||
encoded = base64.b64encode(content.encode("utf-8")).decode("utf-8")
|
||
```
|
||
|
||
---
|
||
|
||
### `AgentResponse`
|
||
|
||
| Field | Type | Description |
|
||
|---|---|---|
|
||
| `text` | `string` | Token text (streaming) or full answer text (non-streaming) |
|
||
| `avap_code` | `string` | Currently always `"AVAP-2026"` in non-streaming mode, empty in streaming |
|
||
| `is_final` | `bool` | `true` only on the last message of the stream |
|
||
|
||
---
|
||
|
||
### `EvalRequest`
|
||
|
||
| Field | Type | Required | Default | Description |
|
||
|---|---|---|---|---|
|
||
| `category` | `string` | No | `""` (all) | Filter golden dataset by category |
|
||
| `limit` | `int32` | No | `0` (all) | Max questions to evaluate |
|
||
| `index` | `string` | No | `$ELASTICSEARCH_INDEX` | ES index to evaluate |
|
||
|
||
---
|
||
|
||
### `EvalResponse`
|
||
|
||
See full definition in [§2.3](#23-evaluaterag).
|
||
|
||
---
|
||
|
||
## 4. Error Handling
|
||
|
||
The engine catches all exceptions and returns them as terminal `AgentResponse` messages rather than gRPC status errors. This means:
|
||
|
||
- The stream will **not** be terminated with a non-OK gRPC status code on application-level errors.
|
||
- Check for error strings in the `text` field that begin with `[ENG] Error:`.
|
||
- The stream will still end with `is_final = true`.
|
||
|
||
**Example error response:**
|
||
```json
|
||
{"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true}
|
||
```
|
||
|
||
**`EvaluateRAG` error response:**
|
||
Returned as a single `EvalResponse` with `status` set to the error description:
|
||
```json
|
||
{"status": "ANTHROPIC_API_KEY no configurada en .env", ...}
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Client Examples
|
||
|
||
### Introspect the service
|
||
|
||
```bash
|
||
grpcurl -plaintext localhost:50052 list
|
||
# Output: brunix.AssistanceEngine
|
||
|
||
grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
|
||
```
|
||
|
||
### `AskAgent` — basic query
|
||
|
||
```bash
|
||
grpcurl -plaintext \
|
||
-d '{"query": "Que significa AVAP?", "session_id": "dev-001"}' \
|
||
localhost:50052 \
|
||
brunix.AssistanceEngine/AskAgent
|
||
```
|
||
|
||
Expected response:
|
||
```json
|
||
{
|
||
"text": "AVAP (Advanced Virtual API Programming) es un DSL Turing Completo...",
|
||
"avap_code": "AVAP-2026",
|
||
"is_final": true
|
||
}
|
||
```
|
||
|
||
### `AskAgent` — with editor context
|
||
|
||
```python
|
||
import base64, json, grpc
|
||
import brunix_pb2, brunix_pb2_grpc
|
||
|
||
def encode(text: str) -> str:
|
||
return base64.b64encode(text.encode("utf-8")).decode("utf-8")
|
||
|
||
channel = grpc.insecure_channel("localhost:50052")
|
||
stub = brunix_pb2_grpc.AssistanceEngineStub(channel)
|
||
|
||
editor_code = """
|
||
try()
|
||
ormDirect("UPDATE users SET active=1", res)
|
||
exception(e)
|
||
addVar(_status, 500)
|
||
addResult("Error")
|
||
end()
|
||
"""
|
||
|
||
request = brunix_pb2.AgentRequest(
|
||
query = "why is this not catching the error?",
|
||
session_id = "dev-001",
|
||
editor_content = encode(editor_code),
|
||
selected_text = encode(editor_code), # same block selected
|
||
extra_context = encode("file: handler.avap"),
|
||
user_info = json.dumps({"dev_id": 1, "project_id": 2, "org_id": 3}),
|
||
)
|
||
|
||
for response in stub.AskAgent(request):
|
||
if response.is_final:
|
||
print(response.text)
|
||
```
|
||
|
||
### `AskAgentStream` — token streaming
|
||
|
||
```bash
|
||
grpcurl -plaintext \
|
||
-d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
|
||
localhost:50052 \
|
||
brunix.AssistanceEngine/AskAgentStream
|
||
```
|
||
|
||
Expected response (truncated):
|
||
```json
|
||
{"text": "Here", "is_final": false}
|
||
{"text": " is", "is_final": false}
|
||
{"text": " a", "is_final": false}
|
||
...
|
||
{"text": "", "is_final": true}
|
||
```
|
||
|
||
### `EvaluateRAG` — run evaluation
|
||
|
||
```bash
|
||
grpcurl -plaintext \
|
||
-d '{"category": "core_syntax", "limit": 10}' \
|
||
localhost:50052 \
|
||
brunix.AssistanceEngine/EvaluateRAG
|
||
```
|
||
|
||
Expected response:
|
||
```json
|
||
{
|
||
"status": "ok",
|
||
"questions_evaluated": 10,
|
||
"elapsed_seconds": 142.3,
|
||
"judge_model": "claude-sonnet-4-20250514",
|
||
"index": "avap-knowledge-v1",
|
||
"faithfulness": 0.8421,
|
||
"answer_relevancy": 0.7913,
|
||
"context_recall": 0.7234,
|
||
"context_precision": 0.6891,
|
||
"global_score": 0.7615,
|
||
"verdict": "ACCEPTABLE",
|
||
"details": [...]
|
||
}
|
||
```
|
||
|
||
### Multi-turn conversation
|
||
|
||
```bash
|
||
# Turn 1
|
||
grpcurl -plaintext \
|
||
-d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
|
||
localhost:50052 brunix.AssistanceEngine/AskAgentStream
|
||
|
||
# Turn 2 — engine has history from Turn 1
|
||
grpcurl -plaintext \
|
||
-d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
|
||
localhost:50052 brunix.AssistanceEngine/AskAgentStream
|
||
```
|
||
|
||
### Regenerate gRPC stubs after modifying `brunix.proto`
|
||
|
||
```bash
|
||
python -m grpc_tools.protoc \
|
||
-I./Docker/protos \
|
||
--python_out=./Docker/src \
|
||
--grpc_python_out=./Docker/src \
|
||
./Docker/protos/brunix.proto
|
||
```
|
||
|
||
---
|
||
|
||
## 6. OpenAI-Compatible Proxy
|
||
|
||
The container also exposes an HTTP server on port `8000` (`openai_proxy.py`) that wraps the gRPC interface under an OpenAI-compatible API. This allows integration with any tool that supports the OpenAI Chat Completions API — `continue.dev`, LiteLLM, Open WebUI, or any custom client.
|
||
|
||
**Base URL:** `http://localhost:8000`
|
||
|
||
### Available endpoints
|
||
|
||
| Method | Endpoint | Description |
|
||
|---|---|---|
|
||
| `POST` | `/v1/chat/completions` | OpenAI Chat Completions. Routes to `AskAgent` or `AskAgentStream`. |
|
||
| `POST` | `/v1/completions` | OpenAI Completions (legacy). |
|
||
| `GET` | `/v1/models` | Lists available models. Returns `brunix`. |
|
||
| `POST` | `/api/chat` | Ollama chat format (NDJSON streaming). |
|
||
| `POST` | `/api/generate` | Ollama generate format (NDJSON streaming). |
|
||
| `GET` | `/api/tags` | Ollama model list. |
|
||
| `GET` | `/health` | Health check. Returns `{"status": "ok"}`. |
|
||
|
||
### `POST /v1/chat/completions`
|
||
|
||
**Routing:** `stream: false` → `AskAgent` (single response). `stream: true` → `AskAgentStream` (SSE token stream).
|
||
|
||
**Request body:**
|
||
|
||
```json
|
||
{
|
||
"model": "brunix",
|
||
"messages": [
|
||
{"role": "user", "content": "Que significa AVAP?"}
|
||
],
|
||
"stream": false,
|
||
"session_id": "uuid-per-conversation",
|
||
"user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
|
||
}
|
||
```
|
||
|
||
**The `user` field (editor context transport):**
|
||
|
||
The standard OpenAI `user` field is used to transport editor context as a JSON string. This allows the VS Code extension to send context without requiring API changes. Non-Brunix clients can omit `user` or set it to a plain string — both are handled gracefully.
|
||
|
||
| Key in `user` JSON | Encoding | Description |
|
||
|---|---|---|
|
||
| `editor_content` | Base64 | Full content of the active editor file |
|
||
| `selected_text` | Base64 | Currently selected text in the editor |
|
||
| `extra_context` | Base64 | Free-form additional context |
|
||
| `user_info` | JSON object | `{"dev_id": int, "project_id": int, "org_id": int}` |
|
||
|
||
**Important:** `session_id` must be sent as a top-level field — never inside the `user` JSON. The proxy reads `session_id` exclusively from the dedicated field.
|
||
|
||
**Example — general query (no editor context):**
|
||
|
||
```bash
|
||
curl -X POST http://localhost:8000/v1/chat/completions \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"model": "brunix",
|
||
"messages": [{"role": "user", "content": "Que significa AVAP?"}],
|
||
"stream": false,
|
||
"session_id": "test-001"
|
||
}'
|
||
```
|
||
|
||
**Example — query with editor context (VS Code extension):**
|
||
|
||
```bash
|
||
curl -X POST http://localhost:8000/v1/chat/completions \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"model": "brunix",
|
||
"messages": [{"role": "user", "content": "que hace este codigo?"}],
|
||
"stream": true,
|
||
"session_id": "test-001",
|
||
"user": "{\"editor_content\":\"<base64>\",\"selected_text\":\"<base64>\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
|
||
}'
|
||
```
|
||
|
||
**Example — empty editor context fields:**
|
||
|
||
```bash
|
||
curl -X POST http://localhost:8000/v1/chat/completions \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"model": "brunix",
|
||
"messages": [{"role": "user", "content": "como funciona addVar?"}],
|
||
"stream": false,
|
||
"session_id": "test-002",
|
||
"user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{}}"
|
||
}'
|
||
```
|