# Brunix Assistance Engine

The **Brunix Assistance Engine** is a high-performance, gRPC-powered AI orchestration service. It serves as the core intelligence layer for the Brunix ecosystem, integrating advanced RAG (Retrieval-Augmented Generation) capabilities with real-time observability.

This project is a strategic joint development:
* **[101OBEX Corp](https://101obex.com):** Infrastructure, System Architecture, and the proprietary **AVAP Technology** stack.
* **[MrHouston](https://mrhouston.net):** Advanced LLM Fine-tuning, Model Training, and Prompt Engineering.

---

## System Architecture (Hybrid Dev Mode)

The engine runs locally for development but connects to the production-grade infrastructure in the **Vultr Cloud (Devaron Cluster)** via secure `kubectl` tunnels.

```mermaid
graph TD
    subgraph Local_Workstation [Developer]
        BE[Brunix Assistance Engine - Docker]
        KT[Kubectl Port-Forward Tunnels]
    end
    
    subgraph Vultr_K8s_Cluster [Production - Devaron Cluster]
        OL[Ollama Light Service - LLM]
        EDB[(Elasticsearch Vector DB)]
        PG[(Postgres - Langfuse Data)]
        LF[Langfuse UI - Web]
    end

    BE -- localhost:11434 --> KT
    BE -- localhost:9200 --> KT
    BE -- localhost:5432 --> KT
    
    KT -- Secure Link --> OL
    KT -- Secure Link --> EDB
    KT -- Secure Link --> PG
    
    Developer -- Browser --> LF
```

---

## Project Structure

```text
├── README.md                     # Setup guide & dev reference (this file)
├── CONTRIBUTING.md               # Contribution standards, GitFlow, PR process
├── SECURITY.md                   # Security policy and vulnerability reporting
├── changelog                     # Version tracking and release history
├── pyproject.toml                # Python project configuration (uv)
├── uv.lock                       # Locked dependency graph
│
├── Docker/                       # Production container
│   ├── protos/
│   │   └── brunix.proto          # gRPC API contract (source of truth)
│   ├── src/
│   │   ├── server.py             # gRPC server — AskAgent, AskAgentStream, EvaluateRAG
│   │   ├── openai_proxy.py       # OpenAI & Ollama-compatible HTTP proxy (port 8000)
│   │   ├── graph.py              # LangGraph orchestration — build_graph, build_prepare_graph
│   │   ├── prompts.py            # Centralized prompt definitions (CLASSIFY, GENERATE, etc.)
│   │   ├── state.py              # AgentState TypedDict (shared across graph nodes)
│   │   ├── evaluate.py           # RAGAS evaluation pipeline (Claude as judge)
│   │   ├── golden_dataset.json   # Ground-truth Q&A dataset for EvaluateRAG
│   │   └── utils/
│   │       ├── emb_factory.py    # Provider-agnostic embedding model factory
│   │       └── llm_factory.py    # Provider-agnostic LLM factory
│   ├── Dockerfile                # Multi-stage container build
│   ├── docker-compose.yaml       # Local dev orchestration
│   ├── entrypoint.sh             # Starts gRPC server + HTTP proxy in parallel
│   ├── requirements.txt          # Pinned production dependencies (exported by uv)
│   ├── .env                      # Local secrets (never commit — see .gitignore)
│   └── .dockerignore             # Excludes dev artifacts from image build context
│
├── docs/                         # Knowledge base & project documentation
│   ├── ARCHITECTURE.md           # Deep technical architecture reference
│   ├── API_REFERENCE.md          # Complete gRPC & HTTP API contract with examples
│   ├── RUNBOOK.md                # Operational playbooks and incident response
│   ├── AVAP_CHUNKER_CONFIG.md    # avap_config.json reference — blocks, statements, semantic tags
│   ├── adr/                      # Architecture Decision Records
│   │   ├── ADR-0001-grpc-primary-interface.md
│   │   ├── ADR-0002-two-phase-streaming.md
│   │   ├── ADR-0003-hybrid-retrieval-rrf.md
│   │   └── ADR-0004-claude-eval-judge.md
│   ├── avap_language_github_docs/ # AVAP language reference docs (GitHub source)
│   ├── developer.avapframework.com/ # AVAP developer portal docs
│   ├── LRM/
│   │   └── avap.md               # AVAP Language Reference Manual (LRM)
│   └── samples/                  # AVAP code samples (.avap) used for ingestion
│
├── ingestion/
│   └── chunks.json               # Last export of ingested chunks (ES bulk output)
│
├── scripts/
│   └── pipelines/
│       │
│       ├── flows/                # Executable pipeline entry points (Typer CLI)
│       │   ├── elasticsearch_ingestion.py  # [PIPELINE A] Chonkie-based ingestion flow
│       │   ├── generate_mbap.py            # Synthetic MBPP-AVAP dataset generator (Claude)
│       │   └── translate_mbpp.py           # MBPP→AVAP dataset translation pipeline
│       │
│       ├── tasks/                # Reusable task modules for Pipeline A
│       │   ├── chunk.py          # Document fetching, Chonkie chunking & ES bulk write
│       │   ├── embeddings.py     # OllamaEmbeddings adapter (Chonkie-compatible)
│       │   └── prompts.py        # Prompt templates for pipeline LLM calls
│       │
│       └── ingestion/            # [PIPELINE B] AVAP-native classic ingestion
│           ├── avap_chunker.py   # Custom AVAP lexer + chunker (MinHash dedup, overlaps)
│           ├── avap_ingestor.py  # Async ES ingestor with DLQ (producer/consumer pattern)
│           ├── avap_config.json  # AVAP language config (blocks, statements, semantic tags)
│           └── ingestion/
│               └── chunks.jsonl  # JSONL output from avap_chunker.py
│
└── src/                          # Shared library (used by both Docker and scripts)
    ├── config.py                 # Pydantic settings — reads all environment variables
    └── utils/
        ├── emb_factory.py        # Embedding model factory
        └── llm_factory.py        # LLM model factory
```

---

## Data Flow & RAG Orchestration

The following diagram illustrates the sequence of a single `AskAgent` request, detailing the retrieval and generation phases through the secure tunnel.

```mermaid
sequenceDiagram
    participant U as External Client (gRPCurl/App)
    participant E as Brunix Engine (Local Docker)
    participant T as Kubectl Tunnel
    participant V as Vector DB (Vultr)
    participant O as Ollama Light (Vultr)

    U->>E: AskAgent(query, session_id)
    Note over E: Start Langfuse Trace
    
    E->>T: Search Context (Embeddings)
    T->>V: Query Index [avap_manuals]
    V-->>T: Return Relevant Chunks
    T-->>E: Contextual Data
    
    E->>T: Generate Completion (Prompt + Context)
    T->>O: Stream Tokens (qwen2.5:1.5b)
    
    loop Token Streaming
        O-->>T: Token
        T-->>E: Token
        E-->>U: gRPC Stream Response {text, avap_code}
    end

    Note over E: Close Langfuse Trace
```

---

## Knowledge Base Ingestion

The Elasticsearch vector index is populated via one of two independent pipelines. Both pipelines require the Elasticsearch tunnel to be active (`localhost:9200`) and the Ollama embedding model (`OLLAMA_EMB_MODEL_NAME`) to be available.

### Pipeline A — Chonkie (recommended for markdown + .avap)

Uses the [Chonkie](https://github.com/chonkie-ai/chonkie) library for semantic chunking. Supports `.md` (via `MarkdownChef`) and `.avap` (via `TextChef` + `TokenChunker`). Chunks are embedded with Ollama and bulk-indexed into Elasticsearch via `ElasticHandshakeWithMetadata`.

**Entry point:** `scripts/pipelines/flows/elasticsearch_ingestion.py`

```bash
# Index all markdown and AVAP files from docs/LRM
python -m scripts.pipelines.flows.elasticsearch_ingestion \
  --docs-folder-path docs/LRM \
  --output ingestion/chunks.json \
  --docs-extension .md .avap \
  --es-index avap-docs-test \
  --delete-es-index

# Index the AVAP code samples
python -m scripts.pipelines.flows.elasticsearch_ingestion \
  --docs-folder-path docs/samples \
  --output ingestion/chunks.json \
  --docs-extension .avap \
  --es-index avap-docs-test
```

**How it works:**

```
docs/**/*.md + docs/**/*.avap
        │
        ▼ FileFetcher (Chonkie)
        │
        ├─ .md  → MarkdownChef → merge code blocks + tables into chunks
        │          ↓
        │         TokenChunker (HuggingFace tokenizer: HF_EMB_MODEL_NAME)
        │
        └─ .avap → TextChef → TokenChunker
        │
        ▼ OllamaEmbeddings.embed_batch()   (OLLAMA_EMB_MODEL_NAME)
        │
        ▼ ElasticHandshakeWithMetadata.write()
              bulk index → {text, embedding, file, start_index, end_index, token_count}
        │
        ▼ export_documents() → ingestion/chunks.json
```

| Chunk field | Source |
|---|---|
| `text` | Raw chunk text |
| `embedding` | Ollama dense vector |
| `start_index` / `end_index` | Character offsets in source file |
| `token_count` | HuggingFace tokenizer count |
| `file` | Source filename |

---

### Pipeline B — AVAP Native (classic, for .avap files with full semantic analysis)

A custom lexer-based chunker purpose-built for the AVAP language using `avap_config.json` as its grammar definition. Produces richer metadata (block type, section, semantic tags, complexity score) and includes **MinHash LSH deduplication** and **semantic overlap** between chunks.

**Entry point:** `scripts/pipelines/ingestion/avap_chunker.py`  
**Grammar config:** `scripts/pipelines/ingestion/avap_config.json` — see [`docs/AVAP_CHUNKER_CONFIG.md`](./docs/AVAP_CHUNKER_CONFIG.md) for the full reference on blocks, statements, semantic tags, and how to extend the grammar.

```bash
python scripts/pipelines/ingestion/avap_chunker.py \
  --lang-config scripts/pipelines/ingestion/avap_config.json \
  --docs-path docs/samples \
  --output scripts/pipelines/ingestion/ingestion/chunks.jsonl \
  --workers 4
```

**Step 2 — Ingest:** `scripts/pipelines/ingestion/avap_ingestor.py`

```bash
# Ingest from existing JSONL
python scripts/pipelines/ingestion/avap_ingestor.py \
  --chunks scripts/pipelines/ingestion/ingestion/chunks.jsonl \
  --index avap-knowledge-v1 \
  --delete

# Check model embedding dimensions first
python scripts/pipelines/ingestion/avap_ingestor.py --probe-dim
```

**How it works:**

```
docs/**/*.avap + docs/**/*.md
        │
        ▼ avap_chunker.py (GenericLexer + LanguageConfig)
        │   ├─ .avap: block detection (function/if/startLoop/try), statement classification
        │   │          semantic tags enrichment, function signature extraction
        │   │          semantic overlap injection (OVERLAP_LINES=3)
        │   └─ .md:   H1/H2/H3 sectioning, fenced code extraction, table isolation,
        │              narrative split by token budget (MAX_NARRATIVE_TOKENS=400)
        │   ├─ MinHash LSH deduplication (threshold=0.85, 128 permutations)
        │   └─ parallel workers (ProcessPoolExecutor)
        │
        ▼ chunks.jsonl  (one JSON per line)
        │
        ▼ avap_ingestor.py (async producer/consumer)
        │   ├─ OllamaAsyncEmbedder — batch embed (BATCH_SIZE_EMBED=8)
        │   ├─ asyncio.Queue (backpressure, QUEUE_MAXSIZE=5)
        │   ├─ ES async_bulk (BATCH_SIZE_ES=50)
        │   └─ DeadLetterQueue — failed chunks saved to failed_chunks_<ts>.jsonl
        │
        ▼ Elasticsearch index
              {chunk_id, content, embedding, doc_type, block_type, section,
               source_file, start_line, end_line, token_estimate, metadata{...}}
```

**Chunk types produced:**

| `doc_type` | `block_type` | Description |
|---|---|---|
| `code` | `function` | Complete AVAP function block |
| `code` | `if` / `startLoop` / `try` | Control flow blocks |
| `function_signature` | `function_signature` | Extracted function signature only (for fast lookup) |
| `code` | `registerEndpoint` / `addVar` / … | Statement-level chunks by AVAP command category |
| `spec` | `narrative` | Markdown prose sections |
| `code_example` | language tag | Fenced code blocks from markdown |
| `bnf` | `bnf` | BNF grammar blocks from markdown |
| `spec` | `table` | Markdown tables |

**Semantic tags** (automatically detected, stored in `metadata`):

`uses_orm` · `uses_http` · `uses_connector` · `uses_async` · `uses_crypto` · `uses_auth` · `uses_error_handling` · `uses_loop` · `uses_json` · `uses_list` · `uses_regex` · `uses_datetime` · `returns_result` · `registers_endpoint`

**Ingestor environment variables:**

| Variable | Default | Description |
|---|---|---|
| `OLLAMA_URL` | `http://localhost:11434` | Ollama base URL for embeddings |
| `OLLAMA_MODEL` | `qwen3-0.6B-emb:latest` | Embedding model name |
| `OLLAMA_EMBEDDING_DIM` | `1024` | Expected embedding dimension (must match model) |

---

## Development Setup

### 1. Prerequisites
* **Docker & Docker Compose**
* **gRPCurl** (`brew install grpcurl`)
* **Access Credentials:** Ensure the file `./ivar.yaml` (Kubeconfig) is present in the root directory.

### 2. Observability Setup (Langfuse)
The engine utilizes Langfuse for end-to-end tracing and performance monitoring.
1.  Access the Dashboard: **http://45.77.119.180**
2.  Create a project and generate API Keys in **Settings**.
3.  Configure your local `.env` file using the reference table below.

### 3. Environment Variables Reference

> **Policy:** Every environment variable used by the engine must be documented in this table. Any PR that introduces a new variable without a corresponding entry here will be rejected. See [CONTRIBUTING.md](./CONTRIBUTING.md#5-environment-variables-policy) for full details.

Create a `.env` file in the project root with the following variables:

```env
PYTHONPATH=${PYTHONPATH}:/home/...
ELASTICSEARCH_URL=http://host.docker.internal:9200
ELASTICSEARCH_LOCAL_URL=http://localhost:9200
ELASTICSEARCH_INDEX=avap-docs-test
ELASTICSEARCH_USER=elastic
ELASTICSEARCH_PASSWORD=changeme
ELASTICSEARCH_API_KEY=
POSTGRES_URL=postgresql://postgres:postgres@localhost:5432/langfuse
LANGFUSE_HOST=http://45.77.119.180
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
OLLAMA_URL=http://host.docker.internal:11434
OLLAMA_LOCAL_URL=http://localhost:11434
OLLAMA_MODEL_NAME=qwen2.5:1.5b
OLLAMA_EMB_MODEL_NAME=qwen3-0.6B-emb:latest
HF_TOKEN=hf_...
HF_EMB_MODEL_NAME=Qwen/Qwen3-Embedding-0.6B
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514
```

| Variable | Required | Description | Example |
|---|---|---|---|
| `PYTHONPATH` | No | Path that aims to the root of the project  | `${PYTHONPATH}:/home/...` |
| `ELASTICSEARCH_URL` | Yes | Elasticsearch endpoint used for vector/context retrieval in Docker | `http://host.docker.internal:9200` |
| `ELASTICSEARCH_LOCAL_URL` | Yes | Elasticsearch endpoint used for vector/context retrieval in local | `http://localhost:9200` |
| `ELASTICSEARCH_INDEX` | Yes | Elasticsearch index name used by the engine | `avap-docs-test` |
| `ELASTICSEARCH_USER` | No | Elasticsearch username (used when API key is not set) | `elastic` |
| `ELASTICSEARCH_PASSWORD` | No | Elasticsearch password (used when API key is not set) | `changeme` |
| `ELASTICSEARCH_API_KEY` | No | Elasticsearch API key (takes precedence over user/password auth) | `abc123...` |
| `POSTGRES_URL` | Yes | PostgreSQL connection string used by the service | `postgresql://postgres:postgres@localhost:5432/langfuse` |
| `LANGFUSE_HOST` | Yes | Langfuse server endpoint (Devaron Cluster) | `http://45.77.119.180` |
| `LANGFUSE_PUBLIC_KEY` | Yes | Langfuse project public key for tracing and observability | `pk-lf-...` |
| `LANGFUSE_SECRET_KEY` | Yes | Langfuse project secret key | `sk-lf-...` |
| `OLLAMA_URL` | Yes | Ollama endpoint used for text generation/embeddings in Docker | `http://host.docker.internal:11434` |
| `OLLAMA_LOCAL_URL` | Yes | Ollama endpoint used for text generation/embeddings in local | `http://localhost:11434` |
| `OLLAMA_MODEL_NAME` | Yes | Ollama model name for generation | `qwen2.5:1.5b` |
| `OLLAMA_EMB_MODEL_NAME` | Yes | Ollama embeddings model name | `qwen3-0.6B-emb:latest` |
| `HF_TOKEN` | Yes | HuggingFace secret token | `hf_...` |
| `HF_EMB_MODEL_NAME` | Yes | HuggingFace embeddings model name | `Qwen/Qwen3-Embedding-0.6B` |
| `ANTHROPIC_API_KEY` | Yes* | Anthropic API key — required for the `EvaluateRAG` endpoint | `sk-ant-...` |
| `ANTHROPIC_MODEL` | No | Claude model used by the RAG evaluation suite | `claude-sonnet-4-20250514` |

> Never commit real secret values. Use placeholder values when sharing configuration examples.

### 4. Infrastructure Tunnels
Open a terminal and establish the connection to the Devaron Cluster:

```bash
# 1. AI Model Tunnel (Ollama)
kubectl port-forward --address 0.0.0.0 svc/ollama-light-service 11434:11434 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &

# 2. Knowledge Base Tunnel (Elasticsearch)
kubectl port-forward --address 0.0.0.0 svc/brunix-vector-db 9200:9200 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &

# 3. Observability DB Tunnel (PostgreSQL)
kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &
```

### 5. Launch the Engine
```bash
docker-compose up -d --build
```

---

## Testing & Debugging

The gRPC service is exposed on port `50052` with **gRPC Reflection** enabled — introspect it at any time without needing the `.proto` file.

```bash
# List available services
grpcurl -plaintext localhost:50052 list

# Describe the full service contract
grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
```

### `AskAgent` — complete response (non-streaming)

Returns the full answer as a single message with `is_final: true`. Suitable for clients that do not support streaming.

```bash
grpcurl -plaintext \
  -d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgent
```

Expected response:
```json
{
  "text": "addVar is an AVAP command used to declare a variable...",
  "avap_code": "AVAP-2026",
  "is_final": true
}
```

### `AskAgentStream` — real token streaming

Emits one `AgentResponse` per token from Ollama. The final message has `is_final: true` and empty `text` — it is a termination signal, not part of the answer.

```bash
grpcurl -plaintext \
  -d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgentStream
```

Expected response stream:
```json
{"text": "Here", "is_final": false}
{"text": " is", "is_final": false}
...
{"text": "", "is_final": true}
```

**Multi-turn conversation:** send subsequent requests with the same `session_id` to maintain context.

```bash
# Turn 1
grpcurl -plaintext \
  -d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream

# Turn 2 — engine has Turn 1 history
grpcurl -plaintext \
  -d '{"query": "Show me a code example", "session_id": "user-abc"}' \
  localhost:50052 brunix.AssistanceEngine/AskAgentStream
```

### `EvaluateRAG` — quality evaluation

Runs the RAGAS evaluation pipeline against the golden dataset using Claude as the judge. Requires `ANTHROPIC_API_KEY` to be set.

```bash
# Full evaluation
grpcurl -plaintext -d '{}' localhost:50052 brunix.AssistanceEngine/EvaluateRAG

# Filtered: first 10 questions of category "core_syntax"
grpcurl -plaintext \
  -d '{"category": "core_syntax", "limit": 10, "index": "avap-docs-test"}' \
  localhost:50052 \
  brunix.AssistanceEngine/EvaluateRAG
```

Expected response:
```json
{
  "status": "ok",
  "questions_evaluated": 10,
  "elapsed_seconds": 142.3,
  "judge_model": "claude-sonnet-4-20250514",
  "faithfulness": 0.8421,
  "answer_relevancy": 0.7913,
  "context_recall": 0.7234,
  "context_precision": 0.6891,
  "global_score": 0.7615,
  "verdict": "ACCEPTABLE"
}
```

Verdict thresholds: `EXCELLENT` ≥ 0.80 · `ACCEPTABLE` ≥ 0.60 · `INSUFFICIENT` < 0.60

---

## HTTP Proxy (OpenAI & Ollama Compatible)

The container also runs an **OpenAI-compatible HTTP proxy** on port `8000` (`openai_proxy.py`). It wraps the gRPC engine transparently — `stream: false` routes to `AskAgent`, `stream: true` routes to `AskAgentStream`.

This enables integration with any tool that supports the OpenAI or Ollama API (continue.dev, LiteLLM, Open WebUI, etc.) without code changes.

### OpenAI endpoints

| Method | Endpoint | Description |
|---|---|---|
| `GET` | `/v1/models` | List available models |
| `POST` | `/v1/chat/completions` | Chat completion — streaming and non-streaming |
| `POST` | `/v1/completions` | Legacy text completion — streaming and non-streaming |
| `GET` | `/health` | Health check — returns gRPC target and status |

**Non-streaming chat:**
```bash
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "What is AVAP?"}],
    "stream": false
  }'
```

**Streaming chat (SSE):**
```bash
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "Write an AVAP hello world API"}],
    "stream": true,
    "session_id": "user-xyz"
  }'
```

> **Brunix extension:** `session_id` is a non-standard field added to the OpenAI schema. Use it to maintain multi-turn conversation context across HTTP requests. If omitted, all requests share the `"default"` session.

### Ollama endpoints

| Method | Endpoint | Description |
|---|---|---|
| `GET` | `/api/tags` | List models (Ollama format) |
| `POST` | `/api/chat` | Chat — NDJSON stream, `stream: true` by default |
| `POST` | `/api/generate` | Text generation — NDJSON stream, `stream: true` by default |

```bash
curl http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "brunix",
    "messages": [{"role": "user", "content": "Explain AVAP loops"}],
    "stream": true
  }'
```

### Proxy environment variables

| Variable | Default | Description |
|---|---|---|
| `BRUNIX_GRPC_TARGET` | `localhost:50051` | gRPC engine address the proxy connects to |
| `PROXY_MODEL_ID` | `brunix` | Model name returned in API responses |
| `PROXY_THREAD_WORKERS` | `20` | Thread pool size for concurrent gRPC calls |

---

## API Contract (Protobuf)

The source of truth for the gRPC interface is `Docker/protos/brunix.proto`. After modifying it, regenerate the stubs:

```bash
python -m grpc_tools.protoc \
  -I./Docker/protos \
  --python_out=./Docker/src \
  --grpc_python_out=./Docker/src \
  ./Docker/protos/brunix.proto
```

For the full API reference — message types, field descriptions, error handling, and all client examples — see [`docs/API_REFERENCE.md`](./docs/API_REFERENCE.md).

---

## Dataset Generation & Evaluation

The engine includes a specialized benchmarking suite to evaluate the model's proficiency in **AVAP syntax**. This is achieved through a synthetic data generator that creates problems in the MBPP (Mostly Basic Python Problems) style, but tailored for the AVAP Language Reference Manual (LRM).

### 1. Synthetic Data Generator
The script `scripts/pipelines/flows/generate_mbap.py` leverages Claude to produce high-quality, executable code examples and validation tests.

**Key Features:**
* **LRM Grounding:** Uses the provided `avap.md` as the source of truth for syntax and logic.
* **Validation Logic:** Generates `test_list` with Python regex assertions to verify the state of the AVAP stack after execution.
* **Balanced Categories:** Covers 14 domains including ORM, Concurrency (`go/gather`), HTTP handling, and Cryptography.

### 2. Usage
Ensure you have the `anthropic` library installed and your API key configured:

```bash
pip install anthropic
export ANTHROPIC_API_KEY="your-sk-ant-key"
```

Run the generator specifying the path to your LRM and the desired output:

```bash
python scripts/pipelines/flows/generate_mbap.py \
  --lrm docs/LRM/avap.md \
  --output evaluation/mbpp_avap.json \
  --problems 300
```

### 3. Dataset Schema
The generated JSON follows this structure:

| Field | Type | Description |
| :--- | :--- | :--- |
| `task_id` | Integer | Unique identifier for the benchmark. |
| `text` | String | Natural language description of the problem (Spanish). |
| `code` | String | The reference AVAP implementation. |
| `test_list` | Array | Python `re.match` expressions to validate execution results. |

### 4. Integration in RAG
These generated examples are used to:
1.  **Fine-tune** the local models (`qwen2.5:1.5b`) or others via the MrHouston pipeline.
2.  **Evaluate** the "Zero-Shot" performance of the engine before deployment.
3.  **Provide Few-Shot examples** in the RAG prompt orchestration (`src/prompts.py`).

---

## Repository Standards & Architecture

### Docker & Build Context
To maintain production-grade security and image efficiency, this project enforces a strict separation between development files and the production runtime:

* **Production Root:** All executable code must reside in the `/app` directory within the container.
* **Exclusions:** The root `/workspace` directory is deprecated. No development artifacts, local logs, or non-essential source files (e.g., `.git`, `tests/`, `docs/`) should be bundled into the final image.
* **Compliance:** All Pull Requests must verify that the `Dockerfile` context is optimized using the provided `.dockerignore`.

*Failure to comply with these architectural standards will result in PR rejection.*

For the full set of contribution standards, see [CONTRIBUTING.md](./CONTRIBUTING.md).

---

## Documentation Index

| Document | Purpose |
|---|---|
| [README.md](./README.md) | Setup guide, env vars reference, quick start (this file) |
| [CONTRIBUTING.md](./CONTRIBUTING.md) | Contribution standards, GitFlow, PR process |
| [SECURITY.md](./SECURITY.md) | Security policy, vulnerability reporting, known limitations |
| [docs/ARCHITECTURE.md](./docs/ARCHITECTURE.md) | Deep technical architecture, component inventory, data flows |
| [docs/API_REFERENCE.md](./docs/API_REFERENCE.md) | Complete gRPC API contract, message types, client examples |
| [docs/RUNBOOK.md](./docs/RUNBOOK.md) | Operational playbooks, health checks, incident response |
| [docs/AVAP_CHUNKER_CONFIG.md](./docs/AVAP_CHUNKER_CONFIG.md) | `avap_config.json` reference — blocks, statements, semantic tags, how to extend |
| [docs/adr/](./docs/adr/) | Architecture Decision Records |

---

## Security & Intellectual Property
* **Data Privacy:** All LLM processing and vector searches are conducted within a private Kubernetes environment.
* **Proprietary Technology:** This repository contains the **AVAP Technology** stack (101OBEX) and specialized training logic (MrHouston). Unauthorized distribution is prohibited.

---