assistance-engine/docs/API_REFERENCE.md

452 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Brunix Assistance Engine — API Reference
> **Protocol:** gRPC (proto3)
> **Port:** `50052` (host) → `50051` (container)
> **Reflection:** Enabled — service introspection available via `grpcurl`
> **Source of truth:** `Docker/protos/brunix.proto`
---
## Table of Contents
1. [Service Definition](#1-service-definition)
2. [Methods](#2-methods)
- [AskAgent](#21-askagent)
- [AskAgentStream](#22-askagentstream)
- [EvaluateRAG](#23-evaluaterag)
3. [Message Types](#3-message-types)
4. [Error Handling](#4-error-handling)
5. [Client Examples](#5-client-examples)
6. [OpenAI-Compatible Proxy](#6-openai-compatible-proxy)
---
## 1. Service Definition
```protobuf
package brunix;
service AssistanceEngine {
rpc AskAgent (AgentRequest) returns (stream AgentResponse);
rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
rpc EvaluateRAG (EvalRequest) returns (EvalResponse);
}
```
Both `AskAgent` and `AskAgentStream` return a **server-side stream** of `AgentResponse` messages. They differ in how they produce and deliver the response — see [§2.1](#21-askagent) and [§2.2](#22-askagentstream).
---
## 2. Methods
### 2.1 `AskAgent`
**Behaviour:** Runs the full LangGraph pipeline (classify → reformulate → retrieve → generate) using `llm.invoke()`. Returns the complete answer as a **single** `AgentResponse` message with `is_final = true`.
**Use case:** Clients that do not support streaming or need a single atomic response.
**Request:** See [`AgentRequest`](#agentrequest) in §3.
**Response stream:**
| Message # | `text` | `avap_code` | `is_final` |
|---|---|---|---|
| 1 (only) | Full answer text | `"AVAP-2026"` | `true` |
**Latency characteristics:** Depends on LLM generation time (non-streaming). Typically 315 seconds for `qwen2.5:1.5b` on the Devaron cluster.
---
### 2.2 `AskAgentStream`
**Behaviour:** Runs `prepare_graph` (classify → reformulate → retrieve), then calls `llm.stream()` directly. Emits one `AgentResponse` per token from Ollama, followed by a terminal message.
**Use case:** Interactive clients (chat UIs, VS Code extension) that need progressive rendering.
**Request:** Same `AgentRequest` as `AskAgent`.
**Response stream:**
| Message # | `text` | `avap_code` | `is_final` |
|---|---|---|---|
| 1…N | Single token | `""` | `false` |
| N+1 (final) | `""` | `""` | `true` |
**Client contract:**
- Accumulate `text` from all messages where `is_final == false` to reconstruct the full answer.
- The `is_final == true` message signals end-of-stream. Its `text` is always empty and should be discarded.
- Do not close the stream early — the engine will fail to persist conversation history if the stream is interrupted.
---
### 2.3 `EvaluateRAG`
**Behaviour:** Runs the RAGAS evaluation pipeline against the golden dataset. Uses the production Ollama LLM for answer generation and Claude as the evaluation judge.
> **Requirement:** `ANTHROPIC_API_KEY` must be configured in the environment. This endpoint will return an error response if it is missing.
**Request:**
```protobuf
message EvalRequest {
string category = 1; // Optional. Filter golden dataset by category name.
// If empty, all categories are evaluated.
int32 limit = 2; // Optional. Evaluate only the first N questions.
// If 0, all matching questions are evaluated.
string index = 3; // Optional. Elasticsearch index to evaluate against.
// If empty, uses the server's configured ELASTICSEARCH_INDEX.
}
```
**Response (single, non-streaming):**
```protobuf
message EvalResponse {
string status = 1; // "ok" or error description
int32 questions_evaluated = 2; // Number of questions actually processed
float elapsed_seconds = 3; // Total wall-clock time
string judge_model = 4; // Claude model used as judge
string index = 5; // Elasticsearch index evaluated
// RAGAS metric scores (0.0 1.0)
float faithfulness = 6;
float answer_relevancy = 7;
float context_recall = 8;
float context_precision = 9;
float global_score = 10; // Mean of non-zero metric scores
string verdict = 11; // "EXCELLENT" | "ACCEPTABLE" | "INSUFFICIENT"
repeated QuestionDetail details = 12;
}
message QuestionDetail {
string id = 1; // Question ID from golden dataset
string category = 2; // Question category
string question = 3; // Question text
string answer_preview = 4; // First 300 chars of generated answer
int32 n_chunks = 5; // Number of context chunks retrieved
}
```
**Verdict thresholds:**
| Score | Verdict |
|---|---|
| ≥ 0.80 | `EXCELLENT` |
| ≥ 0.60 | `ACCEPTABLE` |
| < 0.60 | `INSUFFICIENT` |
---
## 3. Message Types
### `AgentRequest`
```protobuf
message AgentRequest {
string query = 1;
string session_id = 2;
string editor_content = 3;
string selected_text = 4;
string extra_context = 5;
string user_info = 6;
}
```
| Field | Type | Required | Encoding | Description |
|---|---|---|---|---|
| `query` | `string` | Yes | Plain text | User's natural language question. Max recommended: 4096 chars. |
| `session_id` | `string` | No | Plain text | Conversation identifier for multi-turn context. Use a stable UUID per user session. Defaults to `"default"` if empty. |
| `editor_content` | `string` | No | Base64 | Full content of the active file open in the editor at query time. Decoded server-side before entering the graph. |
| `selected_text` | `string` | No | Base64 | Text currently selected in the editor. Primary anchor for query reformulation and generation when the classifier detects an explicit editor reference. |
| `extra_context` | `string` | No | Base64 | Free-form additional context (e.g. file path, language identifier, open diagnostic errors). |
| `user_info` | `string` | No | JSON string | Client identity metadata. Expected format: `{"dev_id": <int>, "project_id": <int>, "org_id": <int>}`. Available in graph state for future routing or personalisation not yet consumed by the graph. |
**Editor context behaviour:**
Fields 36 are all optional. If none are provided the assistant behaves exactly as without them full backward compatibility. When `editor_content` or `selected_text` are provided, the graph classifier determines whether the user's question explicitly refers to that code. Only if the classifier returns `EDITOR` are the context fields injected into the generation prompt. This prevents the model from referencing editor code when the question is unrelated to it.
**Base64 encoding:**
`editor_content`, `selected_text` and `extra_context` must be Base64-encoded before sending. The server decodes them with UTF-8. Malformed Base64 is silently treated as empty string no error is raised.
```python
import base64
encoded = base64.b64encode(content.encode("utf-8")).decode("utf-8")
```
---
### `AgentResponse`
| Field | Type | Description |
|---|---|---|
| `text` | `string` | Token text (streaming) or full answer text (non-streaming) |
| `avap_code` | `string` | Currently always `"AVAP-2026"` in non-streaming mode, empty in streaming |
| `is_final` | `bool` | `true` only on the last message of the stream |
---
### `EvalRequest`
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `category` | `string` | No | `""` (all) | Filter golden dataset by category |
| `limit` | `int32` | No | `0` (all) | Max questions to evaluate |
| `index` | `string` | No | `$ELASTICSEARCH_INDEX` | ES index to evaluate |
---
### `EvalResponse`
See full definition in [§2.3](#23-evaluaterag).
---
## 4. Error Handling
The engine catches all exceptions and returns them as terminal `AgentResponse` messages rather than gRPC status errors. This means:
- The stream will **not** be terminated with a non-OK gRPC status code on application-level errors.
- Check for error strings in the `text` field that begin with `[ENG] Error:`.
- The stream will still end with `is_final = true`.
**Example error response:**
```json
{"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true}
```
**`EvaluateRAG` error response:**
Returned as a single `EvalResponse` with `status` set to the error description:
```json
{"status": "ANTHROPIC_API_KEY no configurada en .env", ...}
```
---
## 5. Client Examples
### Introspect the service
```bash
grpcurl -plaintext localhost:50052 list
# Output: brunix.AssistanceEngine
grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
```
### `AskAgent` — basic query
```bash
grpcurl -plaintext \
-d '{"query": "Que significa AVAP?", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgent
```
Expected response:
```json
{
"text": "AVAP (Advanced Virtual API Programming) es un DSL Turing Completo...",
"avap_code": "AVAP-2026",
"is_final": true
}
```
### `AskAgent` — with editor context
```python
import base64, json, grpc
import brunix_pb2, brunix_pb2_grpc
def encode(text: str) -> str:
return base64.b64encode(text.encode("utf-8")).decode("utf-8")
channel = grpc.insecure_channel("localhost:50052")
stub = brunix_pb2_grpc.AssistanceEngineStub(channel)
editor_code = """
try()
ormDirect("UPDATE users SET active=1", res)
exception(e)
addVar(_status, 500)
addResult("Error")
end()
"""
request = brunix_pb2.AgentRequest(
query = "why is this not catching the error?",
session_id = "dev-001",
editor_content = encode(editor_code),
selected_text = encode(editor_code), # same block selected
extra_context = encode("file: handler.avap"),
user_info = json.dumps({"dev_id": 1, "project_id": 2, "org_id": 3}),
)
for response in stub.AskAgent(request):
if response.is_final:
print(response.text)
```
### `AskAgentStream` — token streaming
```bash
grpcurl -plaintext \
-d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgentStream
```
Expected response (truncated):
```json
{"text": "Here", "is_final": false}
{"text": " is", "is_final": false}
{"text": " a", "is_final": false}
...
{"text": "", "is_final": true}
```
### `EvaluateRAG` — run evaluation
```bash
grpcurl -plaintext \
-d '{"category": "core_syntax", "limit": 10}' \
localhost:50052 \
brunix.AssistanceEngine/EvaluateRAG
```
Expected response:
```json
{
"status": "ok",
"questions_evaluated": 10,
"elapsed_seconds": 142.3,
"judge_model": "claude-sonnet-4-20250514",
"index": "avap-knowledge-v1",
"faithfulness": 0.8421,
"answer_relevancy": 0.7913,
"context_recall": 0.7234,
"context_precision": 0.6891,
"global_score": 0.7615,
"verdict": "ACCEPTABLE",
"details": [...]
}
```
### Multi-turn conversation
```bash
# Turn 1
grpcurl -plaintext \
-d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
# Turn 2 — engine has history from Turn 1
grpcurl -plaintext \
-d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
```
### Regenerate gRPC stubs after modifying `brunix.proto`
```bash
python -m grpc_tools.protoc \
-I./Docker/protos \
--python_out=./Docker/src \
--grpc_python_out=./Docker/src \
./Docker/protos/brunix.proto
```
---
## 6. OpenAI-Compatible Proxy
The container also exposes an HTTP server on port `8000` (`openai_proxy.py`) that wraps the gRPC interface under an OpenAI-compatible API. This allows integration with any tool that supports the OpenAI Chat Completions API `continue.dev`, LiteLLM, Open WebUI, or any custom client.
**Base URL:** `http://localhost:8000`
### Available endpoints
| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/v1/chat/completions` | OpenAI Chat Completions. Routes to `AskAgent` or `AskAgentStream`. |
| `POST` | `/v1/completions` | OpenAI Completions (legacy). |
| `GET` | `/v1/models` | Lists available models. Returns `brunix`. |
| `POST` | `/api/chat` | Ollama chat format (NDJSON streaming). |
| `POST` | `/api/generate` | Ollama generate format (NDJSON streaming). |
| `GET` | `/api/tags` | Ollama model list. |
| `GET` | `/health` | Health check. Returns `{"status": "ok"}`. |
### `POST /v1/chat/completions`
**Routing:** `stream: false` `AskAgent` (single response). `stream: true` `AskAgentStream` (SSE token stream).
**Request body:**
```json
{
"model": "brunix",
"messages": [
{"role": "user", "content": "Que significa AVAP?"}
],
"stream": false,
"session_id": "uuid-per-conversation",
"user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
}
```
**The `user` field (editor context transport):**
The standard OpenAI `user` field is used to transport editor context as a JSON string. This allows the VS Code extension to send context without requiring API changes. Non-Brunix clients can omit `user` or set it to a plain string both are handled gracefully.
| Key in `user` JSON | Encoding | Description |
|---|---|---|
| `editor_content` | Base64 | Full content of the active editor file |
| `selected_text` | Base64 | Currently selected text in the editor |
| `extra_context` | Base64 | Free-form additional context |
| `user_info` | JSON object | `{"dev_id": int, "project_id": int, "org_id": int}` |
**Important:** `session_id` must be sent as a top-level field never inside the `user` JSON. The proxy reads `session_id` exclusively from the dedicated field.
**Example — general query (no editor context):**
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "Que significa AVAP?"}],
"stream": false,
"session_id": "test-001"
}'
```
**Example — query with editor context (VS Code extension):**
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "que hace este codigo?"}],
"stream": true,
"session_id": "test-001",
"user": "{\"editor_content\":\"<base64>\",\"selected_text\":\"<base64>\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
}'
```
**Example — empty editor context fields:**
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "como funciona addVar?"}],
"stream": false,
"session_id": "test-002",
"user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{}}"
}'
```