Brunix Cognitive Assistance Engine

Go to file

rafa-ruiz d7baccd8f0 [FEATURE] Adaptive query routing: PLATFORM type, model specialization, intent history classifier - Add PLATFORM query type that bypasses RAG and uses a lighter model - Introduce OLLAMA_MODEL_NAME_CONVERSATIONAL env var to route CONVERSATIONAL and PLATFORM queries to a separate (smaller) Ollama model - Replace raw message history in classifier with compact intent history (classify_history) to eliminate anchoring bias in small models - Add <history_rule> and <platform_priority_rule> to classifier prompt so the model evaluates each message independently while still resolving ambiguous references from prior turns - Add fast-path detection for known platform-injected prompt prefixes - Add PLATFORM_PROMPT for account/metrics/usage responses - Persist classify_history in classify_history_store alongside session_store - Document decisions in ADR-0008 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-04-09 19:47:27 -07:00
.github	feat: editor context injection (PRD-0002) + repository governance	2026-03-20 19:25:29 -07:00
Docker	[FEATURE] Adaptive query routing: PLATFORM type, model specialization, intent history classifier	2026-04-09 19:47:27 -07:00
docs	[FEATURE] Adaptive query routing: PLATFORM type, model specialization, intent history classifier	2026-04-09 19:47:27 -07:00
ingestion	feat: Enhance Elasticsearch ingestion with metadata export	2026-03-12 12:28:17 +01:00
notebooks	chore: add .gitkeep files to notebooks and scripts directories	2026-02-11 18:06:16 +01:00
research/embeddings	Add BEIR analysis notebooks and evaluation pipeline for embedding models	2026-03-26 16:53:20 +01:00
scripts	scripts documentation	2026-03-26 07:51:01 -07:00
src	fix: load environment variables and add elasticsearch_index to Settings class	2026-03-12 17:37:22 +01:00
.gitignore	fix: enforce repository neutrality by ignoring local IDE and dev-container configurations	2026-02-16 21:10:55 -08:00
CONTRIBUTING.md	feat: editor context injection (PRD-0002) + repository governance	2026-03-20 19:25:29 -07:00
NOTICE	feat: editor context injection (PRD-0002) + repository governance	2026-03-20 19:25:29 -07:00
README.md	Merge pull request #59 from BRUNIX-AI/mrh-online-dev-partial	2026-03-24 06:38:59 -07:00
changelog	feat(docs): typo fix	2026-03-26 10:30:12 +01:00
pyproject.toml	feat: editor context injection (PRD-0002) + repository governance	2026-03-20 19:25:29 -07:00
uv.lock	Refactor code structure for improved readability and maintainability	2026-03-11 17:48:54 +01:00

README.md

Brunix Assistance Engine

The Brunix Assistance Engine is a high-performance, gRPC-powered AI orchestration service. It serves as the core intelligence layer for the Brunix ecosystem, integrating advanced RAG (Retrieval-Augmented Generation) capabilities with real-time observability.

This project is a strategic joint development:

101OBEX Corp: Infrastructure, System Architecture, and the proprietary AVAP Technology stack.
MrHouston: Advanced LLM Fine-tuning, Model Training, and Prompt Engineering.

System Architecture (Hybrid Dev Mode)

The engine runs locally for development but connects to the production-grade infrastructure in the Vultr Cloud (Devaron Cluster) via secure kubectl tunnels.

graph TD
    subgraph Local_Workstation [Developer]
        BE[Brunix Assistance Engine - Docker]
        KT[Kubectl Port-Forward Tunnels]
    end
    
    subgraph Vultr_K8s_Cluster [Production - Devaron Cluster]
        OL[Ollama Light Service - LLM]
        EDB[(Elasticsearch Vector DB)]
        PG[(Postgres - Langfuse Data)]
        LF[Langfuse UI - Web]
    end

    BE -- localhost:11434 --> KT
    BE -- localhost:9200 --> KT
    BE -- localhost:5432 --> KT
    
    KT -- Secure Link --> OL
    KT -- Secure Link --> EDB
    KT -- Secure Link --> PG
    
    Developer -- Browser --> LF

Project Structure

├── README.md                     # Setup guide & dev reference (this file)
├── CONTRIBUTING.md               # Contribution standards, GitFlow, PR process
├── SECURITY.md                   # Security policy and vulnerability reporting
├── changelog                     # Version tracking and release history
├── pyproject.toml                # Python project configuration (uv)
├── uv.lock                       # Locked dependency graph
│
├── Docker/                       # Production container
│   ├── protos/
│   │   └── brunix.proto          # gRPC API contract (source of truth)
│   ├── src/
│   │   ├── server.py             # gRPC server — AskAgent, AskAgentStream, EvaluateRAG
│   │   ├── openai_proxy.py       # OpenAI & Ollama-compatible HTTP proxy (port 8000)
│   │   ├── graph.py              # LangGraph orchestration — build_graph, build_prepare_graph
│   │   ├── prompts.py            # Centralized prompt definitions (CLASSIFY, GENERATE, etc.)
│   │   ├── state.py              # AgentState TypedDict (shared across graph nodes)
│   │   ├── evaluate.py           # RAGAS evaluation pipeline (Claude as judge)
│   │   ├── golden_dataset.json   # Ground-truth Q&A dataset for EvaluateRAG
│   │   └── utils/
│   │       ├── emb_factory.py    # Provider-agnostic embedding model factory
│   │       └── llm_factory.py    # Provider-agnostic LLM factory
│   ├── tests/
│   │   └── test_prd_0002.py      # Unit tests — editor context, classifier, proxy parsing
│   ├── Dockerfile                # Multi-stage container build
│   ├── docker-compose.yaml       # Local dev orchestration
│   ├── entrypoint.sh             # Starts gRPC server + HTTP proxy in parallel
│   ├── requirements.txt          # Pinned production dependencies (exported by uv)
│   ├── .env                      # Local secrets (never commit — see .gitignore)
│   └── .dockerignore             # Excludes dev artifacts from image build context
│
├── docs/                         # Knowledge base & project documentation
│   ├── ARCHITECTURE.md           # Deep technical architecture reference
│   ├── API_REFERENCE.md          # Complete gRPC & HTTP API contract with examples
│   ├── RUNBOOK.md                # Operational playbooks and incident response
│   ├── AVAP_CHUNKER_CONFIG.md    # avap_config.json reference — blocks, statements, semantic tags
│   ├── ADR/                      # Architecture Decision Records
│   │   ├── ADR-0001-grpc-primary-interface.md
│   │   ├── ADR-0002-two-phase-streaming.md
│   │   ├── ADR-0003-hybrid-retrieval-rrf.md
│   │   ├── ADR-0004-claude-eval-judge.md
│   │   └── ADR-0005-embedding-model-selection.md
│   └── product/                  # Product Requirements Documents
│       ├── PRD-0001-openai-compatible-proxy.md
│       └── PRD-0002-editor-context-injection.md
│   ├── avap_language_github_docs/ # AVAP language reference docs (GitHub source)
│   ├── developer.avapframework.com/ # AVAP developer portal docs
│   ├── LRM/
│   │   └── avap.md               # AVAP Language Reference Manual (LRM)
│   └── samples/                  # AVAP code samples (.avap) used for ingestion
│
├── LICENSE                       # Proprietary license — 101OBEX Corp, Delaware
│
├── research/                     # Experiment results, benchmarks, datasets (MrHouston)
│   └── embeddings/               # Embedding model benchmark results (BEIR)
│
├── ingestion/
│   └── chunks.json               # Last export of ingested chunks (ES bulk output)
│
├── scripts/
│   └── pipelines/
│       │
│       ├── flows/                # Executable pipeline entry points (Typer CLI)
│       │   ├── elasticsearch_ingestion.py  # [PIPELINE A] Chonkie-based ingestion flow
│       │   ├── generate_mbap.py            # Synthetic MBPP-AVAP dataset generator (Claude)
│       │   └── translate_mbpp.py           # MBPP→AVAP dataset translation pipeline
│       │
│       ├── tasks/                # Reusable task modules for Pipeline A
│       │   ├── chunk.py          # Document fetching, Chonkie chunking & ES bulk write
│       │   ├── embeddings.py     # OllamaEmbeddings adapter (Chonkie-compatible)
│       │   └── prompts.py        # Prompt templates for pipeline LLM calls
│       │
│       └── ingestion/            # [PIPELINE B] AVAP-native classic ingestion
│           ├── avap_chunker.py   # Custom AVAP lexer + chunker (MinHash dedup, overlaps)
│           ├── avap_ingestor.py  # Async ES ingestor with DLQ (producer/consumer pattern)
│           ├── avap_config.json  # AVAP language config (blocks, statements, semantic tags)
│           └── ingestion/
│               └── chunks.jsonl  # JSONL output from avap_chunker.py
│
├── research/                     # Directory containing all research done alongside its documents, results and notebooks
└── src/                          # Shared library (used by both Docker and scripts)
    ├── config.py                 # Pydantic settings — reads all environment variables
    └── utils/
        ├── emb_factory.py        # Embedding model factory
        └── llm_factory.py        # LLM model factory

Data Flow & RAG Orchestration

The following diagram illustrates the sequence of a single AskAgent request, detailing the retrieval and generation phases through the secure tunnel.

sequenceDiagram
    participant U as External Client (gRPCurl/App)
    participant E as Brunix Engine (Local Docker)
    participant T as Kubectl Tunnel
    participant V as Vector DB (Vultr)
    participant O as Ollama Light (Vultr)

    U->>E: AskAgent(query, session_id)
    Note over E: Start Langfuse Trace
    
    E->>T: Search Context (Embeddings)
    T->>V: Query Index [avap_manuals]
    V-->>T: Return Relevant Chunks
    T-->>E: Contextual Data
    
    E->>T: Generate Completion (Prompt + Context)
    T->>O: Stream Tokens (qwen2.5:1.5b)
    
    loop Token Streaming
        O-->>T: Token
        T-->>E: Token
        E-->>U: gRPC Stream Response {text, avap_code}
    end

    Note over E: Close Langfuse Trace

Knowledge Base Ingestion

The Elasticsearch vector index is populated via one of two independent pipelines. Both pipelines require the Elasticsearch tunnel to be active (localhost:9200) and the Ollama embedding model (OLLAMA_EMB_MODEL_NAME) to be available.

Pipeline A — Chonkie (recommended for markdown + .avap)

Uses the Chonkie library for semantic chunking. Supports .md (via MarkdownChef) and .avap (via TextChef + TokenChunker). Chunks are embedded with Ollama and bulk-indexed into Elasticsearch via ElasticHandshakeWithMetadata.

Entry point: scripts/pipelines/flows/elasticsearch_ingestion.py

# Index all markdown and AVAP files from docs/LRM
python -m scripts.pipelines.flows.elasticsearch_ingestion \
  --docs-folder-path docs/LRM \
  --output ingestion/chunks.json \
  --docs-extension .md .avap \
  --es-index avap-docs-test \
  --delete-es-index

# Index the AVAP code samples
python -m scripts.pipelines.flows.elasticsearch_ingestion \
  --docs-folder-path docs/samples \
  --output ingestion/chunks.json \
  --docs-extension .avap \
  --es-index avap-docs-test

How it works:

docs/**/*.md + docs/**/*.avap
        │
        ▼ FileFetcher (Chonkie)
        │
        ├─ .md  → MarkdownChef → merge code blocks + tables into chunks
        │          ↓
        │         TokenChunker (HuggingFace tokenizer: HF_EMB_MODEL_NAME)
        │
        └─ .avap → TextChef → TokenChunker
        │
        ▼ OllamaEmbeddings.embed_batch()   (OLLAMA_EMB_MODEL_NAME)
        │
        ▼ ElasticHandshakeWithMetadata.write()
              bulk index → {text, embedding, file, start_index, end_index, token_count}
        │
        ▼ export_documents() → ingestion/chunks.json

Chunk field	Source
`text`	Raw chunk text
`embedding`	Ollama dense vector
`start_index` / `end_index`	Character offsets in source file
`token_count`	HuggingFace tokenizer count
`file`	Source filename

Pipeline B — AVAP Native (classic, for .avap files with full semantic analysis)

A custom lexer-based chunker purpose-built for the AVAP language using avap_config.json as its grammar definition. Produces richer metadata (block type, section, semantic tags, complexity score) and includes MinHash LSH deduplication and semantic overlap between chunks.

Entry point: scripts/pipelines/ingestion/avap_chunker.py
Grammar config: scripts/pipelines/ingestion/avap_config.json — see docs/AVAP_CHUNKER_CONFIG.md for the full reference on blocks, statements, semantic tags, and how to extend the grammar.

python scripts/pipelines/ingestion/avap_chunker.py \
  --lang-config scripts/pipelines/ingestion/avap_config.json \
  --docs-path docs/samples \
  --output scripts/pipelines/ingestion/ingestion/chunks.jsonl \
  --workers 4

Step 2 — Ingest: scripts/pipelines/ingestion/avap_ingestor.py

# Ingest from existing JSONL
python scripts/pipelines/ingestion/avap_ingestor.py \
  --chunks scripts/pipelines/ingestion/ingestion/chunks.jsonl \
  --index avap-knowledge-v1 \
  --delete

# Check model embedding dimensions first
python scripts/pipelines/ingestion/avap_ingestor.py --probe-dim

How it works:

docs/**/*.avap + docs/**/*.md
        │
        ▼ avap_chunker.py (GenericLexer + LanguageConfig)
        │   ├─ .avap: block detection (function/if/startLoop/try), statement classification
        │   │          semantic tags enrichment, function signature extraction
        │   │          semantic overlap injection (OVERLAP_LINES=3)
        │   └─ .md:   H1/H2/H3 sectioning, fenced code extraction, table isolation,
        │              narrative split by token budget (MAX_NARRATIVE_TOKENS=400)
        │   ├─ MinHash LSH deduplication (threshold=0.85, 128 permutations)
        │   └─ parallel workers (ProcessPoolExecutor)
        │
        ▼ chunks.jsonl  (one JSON per line)
        │
        ▼ avap_ingestor.py (async producer/consumer)
        │   ├─ OllamaAsyncEmbedder — batch embed (BATCH_SIZE_EMBED=8)
        │   ├─ asyncio.Queue (backpressure, QUEUE_MAXSIZE=5)
        │   ├─ ES async_bulk (BATCH_SIZE_ES=50)
        │   └─ DeadLetterQueue — failed chunks saved to failed_chunks_<ts>.jsonl
        │
        ▼ Elasticsearch index
              {chunk_id, content, embedding, doc_type, block_type, section,
               source_file, start_line, end_line, token_estimate, metadata{...}}

Chunk types produced:

`doc_type`	`block_type`	Description
`code`	`function`	Complete AVAP function block
`code`	`if` / `startLoop` / `try`	Control flow blocks
`function_signature`	`function_signature`	Extracted function signature only (for fast lookup)
`code`	`registerEndpoint` / `addVar` / …	Statement-level chunks by AVAP command category
`spec`	`narrative`	Markdown prose sections
`code_example`	language tag	Fenced code blocks from markdown
`bnf`	`bnf`	BNF grammar blocks from markdown
`spec`	`table`	Markdown tables

Semantic tags (automatically detected, stored in metadata):

uses_orm · uses_http · uses_connector · uses_async · uses_crypto · uses_auth · uses_error_handling · uses_loop · uses_json · uses_list · uses_regex · uses_datetime · returns_result · registers_endpoint

Ingestor environment variables:

Variable	Default	Description
`OLLAMA_URL`	`http://localhost:11434`	Ollama base URL for embeddings
`OLLAMA_MODEL`	`qwen3-0.6B-emb:latest`	Embedding model name
`OLLAMA_EMBEDDING_DIM`	`1024`	Expected embedding dimension (must match model)

Development Setup

1. Prerequisites

Docker & Docker Compose
gRPCurl (brew install grpcurl)
Access Credentials: Ensure the file ./ivar.yaml (Kubeconfig) is present in the root directory.

2. Observability Setup (Langfuse)

The engine utilizes Langfuse for end-to-end tracing and performance monitoring.

Access the Dashboard: http://45.77.119.180
Create a project and generate API Keys in Settings.
Configure your local .env file using the reference table below.

3. Environment Variables Reference

Policy: Every environment variable used by the engine must be documented in this table. Any PR that introduces a new variable without a corresponding entry here will be rejected. See CONTRIBUTING.md for full details.

Create a .env file in the project root with the following variables:

PYTHONPATH=${PYTHONPATH}:/home/...
ELASTICSEARCH_URL=http://host.docker.internal:9200
ELASTICSEARCH_LOCAL_URL=http://localhost:9200
ELASTICSEARCH_INDEX=avap-docs-test
ELASTICSEARCH_USER=elastic
ELASTICSEARCH_PASSWORD=changeme
ELASTICSEARCH_API_KEY=
POSTGRES_URL=postgresql://postgres:postgres@localhost:5432/langfuse
LANGFUSE_HOST=http://45.77.119.180
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
OLLAMA_URL=http://host.docker.internal:11434
OLLAMA_LOCAL_URL=http://localhost:11434
OLLAMA_MODEL_NAME=qwen2.5:1.5b
OLLAMA_EMB_MODEL_NAME=qwen3-0.6B-emb:latest
HF_TOKEN=hf_...
HF_EMB_MODEL_NAME=Qwen/Qwen3-Embedding-0.6B
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514

Variable	Required	Description	Example
`PYTHONPATH`	No	Path that aims to the root of the project	`${PYTHONPATH}:/home/...`
`ELASTICSEARCH_URL`	Yes	Elasticsearch endpoint used for vector/context retrieval in Docker	`http://host.docker.internal:9200`
`ELASTICSEARCH_LOCAL_URL`	Yes	Elasticsearch endpoint used for vector/context retrieval in local	`http://localhost:9200`
`ELASTICSEARCH_INDEX`	Yes	Elasticsearch index name used by the engine	`avap-docs-test`
`ELASTICSEARCH_USER`	No	Elasticsearch username (used when API key is not set)	`elastic`
`ELASTICSEARCH_PASSWORD`	No	Elasticsearch password (used when API key is not set)	`changeme`
`ELASTICSEARCH_API_KEY`	No	Elasticsearch API key (takes precedence over user/password auth)	`abc123...`
`POSTGRES_URL`	Yes	PostgreSQL connection string used by the service	`postgresql://postgres:postgres@localhost:5432/langfuse`
`LANGFUSE_HOST`	Yes	Langfuse server endpoint (Devaron Cluster)	`http://45.77.119.180`
`LANGFUSE_PUBLIC_KEY`	Yes	Langfuse project public key for tracing and observability	`pk-lf-...`
`LANGFUSE_SECRET_KEY`	Yes	Langfuse project secret key	`sk-lf-...`
`OLLAMA_URL`	Yes	Ollama endpoint used for text generation/embeddings in Docker	`http://host.docker.internal:11434`
`OLLAMA_LOCAL_URL`	Yes	Ollama endpoint used for text generation/embeddings in local	`http://localhost:11434`
`OLLAMA_MODEL_NAME`	Yes	Ollama model name for generation	`qwen2.5:1.5b`
`OLLAMA_EMB_MODEL_NAME`	Yes	Ollama embeddings model name	`qwen3-0.6B-emb:latest`
`HF_TOKEN`	Yes	HuggingFace secret token	`hf_...`
`HF_EMB_MODEL_NAME`	Yes	HuggingFace embeddings model name	`Qwen/Qwen3-Embedding-0.6B`
`ANTHROPIC_API_KEY`	Yes*	Anthropic API key — required for the `EvaluateRAG` endpoint	`sk-ant-...`
`ANTHROPIC_MODEL`	No	Claude model used by the RAG evaluation suite	`claude-sonnet-4-20250514`

Never commit real secret values. Use placeholder values when sharing configuration examples.

4. Infrastructure Tunnels

Open a terminal and establish the connection to the Devaron Cluster:

# 1. AI Model Tunnel (Ollama)
kubectl port-forward --address 0.0.0.0 svc/ollama-light-service 11434:11434 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &

# 2. Knowledge Base Tunnel (Elasticsearch)
kubectl port-forward --address 0.0.0.0 svc/brunix-vector-db 9200:9200 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &

# 3. Observability DB Tunnel (PostgreSQL)
kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &

5. Launch the Engine

docker-compose up -d --build

Testing & Debugging

The gRPC service is exposed on port 50052 with gRPC Reflection enabled — introspect it at any time without needing the .proto file.

# List available services
grpcurl -plaintext localhost:50052 list

# Describe the full service contract
grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine

`AskAgent` — complete response (non-streaming)

Returns the full answer as a single message with is_final: true. Suitable for clients that do not support streaming.

grpcurl -plaintext \
  -d '{"query": "Que significa AVAP?", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgent

Expected response:

{
  "text": "AVAP (Advanced Virtual API Programming) es un DSL Turing Completo...",
  "avap_code": "AVAP-2026",
  "is_final": true
}

`AskAgentStream` — real token streaming

Emits one AgentResponse per token from Ollama. The final message has is_final: true and empty text — it is a termination signal, not part of the answer.

grpcurl -plaintext \
  -d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgentStream

Expected response stream:

{"text": "Here", "is_final": false}
{"text": " is", "is_final": false}
...
{"text": "", "is_final": true}

Multi-turn conversation: send subsequent requests with the same session_id to maintain context.

# Turn 1
grpcurl -plaintext \
  -d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream

# Turn 2 — engine has Turn 1 history
grpcurl -plaintext \
  -d '{"query": "Show me a code example", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream

`EvaluateRAG` — quality evaluation

Runs the RAGAS evaluation pipeline against the golden dataset using Claude as the judge. Requires ANTHROPIC_API_KEY to be set.

# Full evaluation
grpcurl -plaintext -d '{}' localhost:50052 brunix.AssistanceEngine/EvaluateRAG

# Filtered: first 10 questions of category "core_syntax"
grpcurl -plaintext \
  -d '{"category": "core_syntax", "limit": 10, "index": "avap-docs-test"}' \
  localhost:50052 \
  brunix.AssistanceEngine/EvaluateRAG

Expected response:

{
  "status": "ok",
  "questions_evaluated": 10,
  "elapsed_seconds": 142.3,
  "judge_model": "claude-sonnet-4-20250514",
  "faithfulness": 0.8421,
  "answer_relevancy": 0.7913,
  "context_recall": 0.7234,
  "context_precision": 0.6891,
  "global_score": 0.7615,
  "verdict": "ACCEPTABLE"
}

Verdict thresholds: EXCELLENT ≥ 0.80 · ACCEPTABLE ≥ 0.60 · INSUFFICIENT < 0.60

HTTP Proxy (OpenAI & Ollama Compatible)

The container also runs an OpenAI-compatible HTTP proxy on port 8000 (openai_proxy.py). It wraps the gRPC engine transparently — stream: false routes to AskAgent, stream: true routes to AskAgentStream.

This enables integration with any tool that supports the OpenAI or Ollama API (continue.dev, LiteLLM, Open WebUI, etc.) without code changes.

OpenAI endpoints

Method	Endpoint	Description
`GET`	`/v1/models`	List available models
`POST`	`/v1/chat/completions`	Chat completion — streaming and non-streaming
`POST`	`/v1/completions`	Legacy text completion — streaming and non-streaming
`GET`	`/health`	Health check — returns gRPC target and status

Non-streaming chat — general query:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "Que significa AVAP?"}],
    "stream": false,
    "session_id": "dev-001"
  }'

Non-streaming chat — with editor context (VS Code extension):

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "que hace este codigo?"}],
    "stream": false,
    "session_id": "dev-001",
    "user": "{\"editor_content\":\"\",\"selected_text\":\"<base64>\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
  }'

Editor context transport: The user field carries editor context as a JSON string. editor_content, selected_text, and extra_context must be Base64-encoded. user_info is a JSON object with dev_id, project_id, and org_id. The engine only injects editor context into the response when the classifier detects the user is explicitly referring to their code. See docs/API_REFERENCE.md for full details.

Streaming chat (SSE):

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "Write an AVAP hello world API"}],
    "stream": true,
    "session_id": "user-xyz"
  }'

Brunix extension: session_id is a non-standard field added to the OpenAI schema. Use it to maintain multi-turn conversation context across HTTP requests. If omitted, all requests share the "default" session.

Ollama endpoints

Method	Endpoint	Description
`GET`	`/api/tags`	List models (Ollama format)
`POST`	`/api/chat`	Chat — NDJSON stream, `stream: true` by default
`POST`	`/api/generate`	Text generation — NDJSON stream, `stream: true` by default

curl http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "Explain AVAP loops"}],
    "stream": true
  }'

Proxy environment variables

Variable	Default	Description
`BRUNIX_GRPC_TARGET`	`localhost:50051`	gRPC engine address the proxy connects to
`PROXY_MODEL_ID`	`brunix`	Model name returned in API responses
`PROXY_THREAD_WORKERS`	`20`	Thread pool size for concurrent gRPC calls

API Contract (Protobuf)

The source of truth for the gRPC interface is Docker/protos/brunix.proto. After modifying it, regenerate the stubs:

python -m grpc_tools.protoc \
  -I./Docker/protos \
  --python_out=./Docker/src \
  --grpc_python_out=./Docker/src \
  ./Docker/protos/brunix.proto

For the full API reference — message types, field descriptions, error handling, and all client examples — see docs/API_REFERENCE.md.

Dataset Generation & Evaluation

The engine includes a specialized benchmarking suite to evaluate the model's proficiency in AVAP syntax. This is achieved through a synthetic data generator that creates problems in the MBPP (Mostly Basic Python Problems) style, but tailored for the AVAP Language Reference Manual (LRM).

1. Synthetic Data Generator

The script scripts/pipelines/flows/generate_mbap.py leverages Claude to produce high-quality, executable code examples and validation tests.

Key Features:

LRM Grounding: Uses the provided avap.md as the source of truth for syntax and logic.
Validation Logic: Generates test_list with Python regex assertions to verify the state of the AVAP stack after execution.
Balanced Categories: Covers 14 domains including ORM, Concurrency (go/gather), HTTP handling, and Cryptography.

2. Usage

Ensure you have the anthropic library installed and your API key configured:

pip install anthropic
export ANTHROPIC_API_KEY="your-sk-ant-key"

Run the generator specifying the path to your LRM and the desired output:

python scripts/pipelines/flows/generate_mbap.py \
  --lrm docs/LRM/avap.md \
  --output evaluation/mbpp_avap.json \
  --problems 300

3. Dataset Schema

The generated JSON follows this structure:

Field	Type	Description
`task_id`	Integer	Unique identifier for the benchmark.
`text`	String	Natural language description of the problem (Spanish).
`code`	String	The reference AVAP implementation.
`test_list`	Array	Python `re.match` expressions to validate execution results.

4. Integration in RAG

These generated examples are used to:

Fine-tune the local models (qwen2.5:1.5b) or others via the MrHouston pipeline.
Evaluate the "Zero-Shot" performance of the engine before deployment.
Provide Few-Shot examples in the RAG prompt orchestration (src/prompts.py).

Repository Standards & Architecture

Docker & Build Context

To maintain production-grade security and image efficiency, this project enforces a strict separation between development files and the production runtime:

Production Root: All executable code must reside in the /app directory within the container.
Exclusions: The root /workspace directory is deprecated. No development artifacts, local logs, or non-essential source files (e.g., .git, tests/, docs/) should be bundled into the final image.
Compliance: All Pull Requests must verify that the Dockerfile context is optimized using the provided .dockerignore.

Failure to comply with these architectural standards will result in PR rejection.

For the full set of contribution standards, see CONTRIBUTING.md.

Documentation Index

Document	Purpose
README.md	Setup guide, env vars reference, quick start (this file)
CONTRIBUTING.md	Contribution standards, GitFlow, PR process
SECURITY.md	Security policy, vulnerability reporting, known limitations
docs/ARCHITECTURE.md	Deep technical architecture, component inventory, data flows
docs/API_REFERENCE.md	Complete gRPC API contract, message types, client examples
docs/RUNBOOK.md	Operational playbooks, health checks, incident response
docs/AVAP_CHUNKER_CONFIG.md	`avap_config.json` reference — blocks, statements, semantic tags, how to extend
docs/ADR/	Architecture Decision Records
docs/product/	Product Requirements Documents
research/	Experiment results, benchmarks, and datasets

Security & Intellectual Property

Data Privacy: All LLM processing and vector searches are conducted within a private Kubernetes environment.
Proprietary Technology: This repository contains the AVAP Technology stack (101OBEX) and specialized training logic (MrHouston). Unauthorized distribution is prohibited.