assistance-engine/docs/API_REFERENCE.md

9.9 KiB
Raw Blame History

Brunix Assistance Engine — API Reference

Protocol: gRPC (proto3) Port: 50052 (host) → 50051 (container) Reflection: Enabled — service introspection available via grpcurl Source of truth: Docker/protos/brunix.proto


Table of Contents

  1. Service Definition
  2. Methods
  3. Message Types
  4. Error Handling
  5. Client Examples
  6. OpenAI-Compatible Proxy

1. Service Definition

package brunix;

service AssistanceEngine {
  rpc AskAgent       (AgentRequest) returns (stream AgentResponse);
  rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
  rpc EvaluateRAG    (EvalRequest)  returns (EvalResponse);
}

Both AskAgent and AskAgentStream return a server-side stream of AgentResponse messages. They differ in how they produce and deliver the response — see §2.1 and §2.2.


2. Methods

2.1 AskAgent

Behaviour: Runs the full LangGraph pipeline (classify → reformulate → retrieve → generate) using llm.invoke(). Returns the complete answer as a single AgentResponse message with is_final = true.

Use case: Clients that do not support streaming or need a single atomic response.

Request:

message AgentRequest {
  string query      = 1;  // The user's question. Required. Max recommended: 4096 chars.
  string session_id = 2;  // Conversation session identifier. Optional.
                           // If empty, defaults to "default" (shared session).
                           // Use a UUID per user/conversation for isolation.
}

Response stream:

Message # text avap_code is_final
1 (only) Full answer text "AVAP-2026" true

Latency characteristics: Depends on LLM generation time (non-streaming). Typically 315 seconds for qwen2.5:1.5b on the Devaron cluster.


2.2 AskAgentStream

Behaviour: Runs prepare_graph (classify → reformulate → retrieve), then calls llm.stream() directly. Emits one AgentResponse per token from Ollama, followed by a terminal message.

Use case: Interactive clients (chat UIs, terminal tools) that need progressive rendering.

Request: Same AgentRequest as AskAgent.

Response stream:

Message # text avap_code is_final
1…N Single token "" false
N+1 (final) "" "" true

Client contract:

  • Accumulate text from all messages where is_final == false to reconstruct the full answer.
  • The is_final == true message signals end-of-stream. Its text is always empty and should be discarded.
  • Do not close the stream early — the engine will fail to persist conversation history if the stream is interrupted.

2.3 EvaluateRAG

Behaviour: Runs the RAGAS evaluation pipeline against the golden dataset. Uses the production Ollama LLM for answer generation and Claude as the evaluation judge.

Requirement: ANTHROPIC_API_KEY must be configured in the environment. This endpoint will return an error response if it is missing.

Request:

message EvalRequest {
  string category = 1;  // Optional. Filter golden dataset by category name.
                         // If empty, all categories are evaluated.
  int32  limit    = 2;  // Optional. Evaluate only the first N questions.
                         // If 0, all matching questions are evaluated.
  string index    = 3;  // Optional. Elasticsearch index to evaluate against.
                         // If empty, uses the server's configured ELASTICSEARCH_INDEX.
}

Response (single, non-streaming):

message EvalResponse {
  string status               = 1;  // "ok" or error description
  int32  questions_evaluated  = 2;  // Number of questions actually processed
  float  elapsed_seconds      = 3;  // Total wall-clock time
  string judge_model          = 4;  // Claude model used as judge
  string index                = 5;  // Elasticsearch index evaluated

  // RAGAS metric scores (0.0  1.0)
  float  faithfulness         = 6;
  float  answer_relevancy     = 7;
  float  context_recall       = 8;
  float  context_precision    = 9;

  float  global_score         = 10; // Mean of non-zero metric scores
  string verdict              = 11; // "EXCELLENT" | "ACCEPTABLE" | "INSUFFICIENT"

  repeated QuestionDetail details = 12;
}

message QuestionDetail {
  string id             = 1;  // Question ID from golden dataset
  string category       = 2;  // Question category
  string question       = 3;  // Question text
  string answer_preview = 4;  // First 300 chars of generated answer
  int32  n_chunks       = 5;  // Number of context chunks retrieved
}

Verdict thresholds:

Score Verdict
≥ 0.80 EXCELLENT
≥ 0.60 ACCEPTABLE
< 0.60 INSUFFICIENT

3. Message Types

AgentRequest

Field Type Required Description
query string Yes User's natural language question
session_id string No Conversation identifier for multi-turn context. Use a stable UUID per user session.

AgentResponse

Field Type Description
text string Token text (streaming) or full answer text (non-streaming)
avap_code string Currently always "AVAP-2026" in non-streaming mode, empty in streaming
is_final bool true only on the last message of the stream

EvalRequest

Field Type Required Default Description
category string No "" (all) Filter golden dataset by category
limit int32 No 0 (all) Max questions to evaluate
index string No $ELASTICSEARCH_INDEX ES index to evaluate

EvalResponse

See full definition in §2.3.


4. Error Handling

The engine catches all exceptions and returns them as terminal AgentResponse messages rather than gRPC status errors. This means:

  • The stream will not be terminated with a non-OK gRPC status code on application-level errors.
  • Check for error strings in the text field that begin with [ENG] Error:.
  • The stream will still end with is_final = true.

Example error response:

{"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true}

EvaluateRAG error response:
Returned as a single EvalResponse with status set to the error description:

{"status": "ANTHROPIC_API_KEY no configurada en .env", ...}

5. Client Examples

Introspect the service

grpcurl -plaintext localhost:50052 list
# Output: brunix.AssistanceEngine

grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine

AskAgent — full response

grpcurl -plaintext \
  -d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgent

Expected response:

{
  "text": "addVar is an AVAP command that declares a new variable...",
  "avap_code": "AVAP-2026",
  "is_final": true
}

AskAgentStream — token streaming

grpcurl -plaintext \
  -d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgentStream

Expected response (truncated):

{"text": "Here", "is_final": false}
{"text": " is", "is_final": false}
{"text": " a", "is_final": false}
...
{"text": "", "is_final": true}

EvaluateRAG — run evaluation

# Evaluate first 10 questions from the "core_syntax" category
grpcurl -plaintext \
  -d '{"category": "core_syntax", "limit": 10}' \
  localhost:50052 \
  brunix.AssistanceEngine/EvaluateRAG

Expected response:

{
  "status": "ok",
  "questions_evaluated": 10,
  "elapsed_seconds": 142.3,
  "judge_model": "claude-sonnet-4-20250514",
  "index": "avap-docs-test",
  "faithfulness": 0.8421,
  "answer_relevancy": 0.7913,
  "context_recall": 0.7234,
  "context_precision": 0.6891,
  "global_score": 0.7615,
  "verdict": "ACCEPTABLE",
  "details": [...]
}

Multi-turn conversation example

# Turn 1
grpcurl -plaintext \
  -d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream

# Turn 2 — the engine has history from Turn 1
grpcurl -plaintext \
  -d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream

Regenerate gRPC stubs after modifying brunix.proto

python -m grpc_tools.protoc \
  -I./Docker/protos \
  --python_out=./Docker/src \
  --grpc_python_out=./Docker/src \
  ./Docker/protos/brunix.proto

6. OpenAI-Compatible Proxy

The container also exposes an HTTP server on port 8000 (openai_proxy.py) that wraps AskAgentStream under an OpenAI-compatible endpoint. This allows integration with any tool that supports the OpenAI Chat Completions API.

Base URL: http://localhost:8000

POST /v1/chat/completions

Request body:

{
  "model": "brunix",
  "messages": [
    {"role": "user", "content": "What is addVar in AVAP?"}
  ],
  "stream": true
}

Notes:

  • The model field is ignored; the engine always uses the configured OLLAMA_MODEL_NAME.
  • Session management is handled internally by the proxy. Conversation continuity across separate HTTP requests is not guaranteed.
  • Only stream: true is fully supported. Non-streaming mode may be available but is not the primary use case.

Example with curl:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "Explain AVAP loops"}],
    "stream": true
  }'