# Brunix Assistance Engine The **Brunix Assistance Engine** is a high-performance, gRPC-powered AI orchestration service. It serves as the core intelligence layer for the Brunix ecosystem, integrating advanced RAG (Retrieval-Augmented Generation) capabilities with real-time observability. This project is a strategic joint development: * **[101OBEX Corp](https://101obex.com):** Infrastructure, System Architecture, and the proprietary **AVAP Technology** stack. * **[MrHouston](https://mrhouston.net):** Advanced LLM Fine-tuning, Model Training, and Prompt Engineering. --- ## System Architecture (Hybrid Dev Mode) The engine runs locally for development but connects to the production-grade infrastructure in the **Vultr Cloud (Devaron Cluster)** via secure `kubectl` tunnels. ```mermaid graph TD subgraph Local_Workstation [Developer] BE[Brunix Assistance Engine - Docker] KT[Kubectl Port-Forward Tunnels] end subgraph Vultr_K8s_Cluster [Production - Devaron Cluster] OL[Ollama Light Service - LLM] EDB[(Elasticsearch Vector DB)] PG[(Postgres - Langfuse Data)] LF[Langfuse UI - Web] end BE -- localhost:11434 --> KT BE -- localhost:9200 --> KT BE -- localhost:5432 --> KT KT -- Secure Link --> OL KT -- Secure Link --> EDB KT -- Secure Link --> PG Developer -- Browser --> LF ``` --- ## Project Structure ```text ├── README.md # Setup guide & dev reference (this file) ├── CONTRIBUTING.md # Contribution standards, GitFlow, PR process ├── SECURITY.md # Security policy and vulnerability reporting ├── changelog # Version tracking and release history ├── pyproject.toml # Python project configuration (uv) ├── uv.lock # Locked dependency graph │ ├── Docker/ # Production container │ ├── protos/ │ │ └── brunix.proto # gRPC API contract (source of truth) │ ├── src/ │ │ ├── server.py # gRPC server — AskAgent, AskAgentStream, EvaluateRAG │ │ ├── openai_proxy.py # OpenAI & Ollama-compatible HTTP proxy (port 8000) │ │ ├── graph.py # LangGraph orchestration — build_graph, build_prepare_graph │ │ ├── prompts.py # Centralized prompt definitions (CLASSIFY, GENERATE, etc.) │ │ ├── state.py # AgentState TypedDict (shared across graph nodes) │ │ ├── evaluate.py # RAGAS evaluation pipeline (Claude as judge) │ │ ├── golden_dataset.json # Ground-truth Q&A dataset for EvaluateRAG │ │ └── utils/ │ │ ├── emb_factory.py # Provider-agnostic embedding model factory │ │ └── llm_factory.py # Provider-agnostic LLM factory │ ├── Dockerfile # Multi-stage container build │ ├── docker-compose.yaml # Local dev orchestration │ ├── entrypoint.sh # Starts gRPC server + HTTP proxy in parallel │ ├── requirements.txt # Pinned production dependencies (exported by uv) │ ├── .env # Local secrets (never commit — see .gitignore) │ └── .dockerignore # Excludes dev artifacts from image build context │ ├── docs/ # Knowledge base & project documentation │ ├── ARCHITECTURE.md # Deep technical architecture reference │ ├── API_REFERENCE.md # Complete gRPC & HTTP API contract with examples │ ├── RUNBOOK.md # Operational playbooks and incident response │ ├── AVAP_CHUNKER_CONFIG.md # avap_config.json reference — blocks, statements, semantic tags │ ├── adr/ # Architecture Decision Records │ │ ├── ADR-0001-grpc-primary-interface.md │ │ ├── ADR-0002-two-phase-streaming.md │ │ ├── ADR-0003-hybrid-retrieval-rrf.md │ │ └── ADR-0004-claude-eval-judge.md │ ├── avap_language_github_docs/ # AVAP language reference docs (GitHub source) │ ├── developer.avapframework.com/ # AVAP developer portal docs │ ├── LRM/ │ │ └── avap.md # AVAP Language Reference Manual (LRM) │ └── samples/ # AVAP code samples (.avap) used for ingestion │ ├── ingestion/ │ └── chunks.json # Last export of ingested chunks (ES bulk output) │ ├── scripts/ │ └── pipelines/ │ │ │ ├── flows/ # Executable pipeline entry points (Typer CLI) │ │ ├── elasticsearch_ingestion.py # [PIPELINE A] Chonkie-based ingestion flow │ │ ├── generate_mbap.py # Synthetic MBPP-AVAP dataset generator (Claude) │ │ └── translate_mbpp.py # MBPP→AVAP dataset translation pipeline │ │ │ ├── tasks/ # Reusable task modules for Pipeline A │ │ ├── chunk.py # Document fetching, Chonkie chunking & ES bulk write │ │ ├── embeddings.py # OllamaEmbeddings adapter (Chonkie-compatible) │ │ └── prompts.py # Prompt templates for pipeline LLM calls │ │ │ └── ingestion/ # [PIPELINE B] AVAP-native classic ingestion │ ├── avap_chunker.py # Custom AVAP lexer + chunker (MinHash dedup, overlaps) │ ├── avap_ingestor.py # Async ES ingestor with DLQ (producer/consumer pattern) │ ├── avap_config.json # AVAP language config (blocks, statements, semantic tags) │ └── ingestion/ │ └── chunks.jsonl # JSONL output from avap_chunker.py │ └── src/ # Shared library (used by both Docker and scripts) ├── config.py # Pydantic settings — reads all environment variables └── utils/ ├── emb_factory.py # Embedding model factory └── llm_factory.py # LLM model factory ``` --- ## Data Flow & RAG Orchestration The following diagram illustrates the sequence of a single `AskAgent` request, detailing the retrieval and generation phases through the secure tunnel. ```mermaid sequenceDiagram participant U as External Client (gRPCurl/App) participant E as Brunix Engine (Local Docker) participant T as Kubectl Tunnel participant V as Vector DB (Vultr) participant O as Ollama Light (Vultr) U->>E: AskAgent(query, session_id) Note over E: Start Langfuse Trace E->>T: Search Context (Embeddings) T->>V: Query Index [avap_manuals] V-->>T: Return Relevant Chunks T-->>E: Contextual Data E->>T: Generate Completion (Prompt + Context) T->>O: Stream Tokens (qwen2.5:1.5b) loop Token Streaming O-->>T: Token T-->>E: Token E-->>U: gRPC Stream Response {text, avap_code} end Note over E: Close Langfuse Trace ``` --- ## Knowledge Base Ingestion The Elasticsearch vector index is populated via one of two independent pipelines. Both pipelines require the Elasticsearch tunnel to be active (`localhost:9200`) and the Ollama embedding model (`OLLAMA_EMB_MODEL_NAME`) to be available. ### Pipeline A — Chonkie (recommended for markdown + .avap) Uses the [Chonkie](https://github.com/chonkie-ai/chonkie) library for semantic chunking. Supports `.md` (via `MarkdownChef`) and `.avap` (via `TextChef` + `TokenChunker`). Chunks are embedded with Ollama and bulk-indexed into Elasticsearch via `ElasticHandshakeWithMetadata`. **Entry point:** `scripts/pipelines/flows/elasticsearch_ingestion.py` ```bash # Index all markdown and AVAP files from docs/LRM python -m scripts.pipelines.flows.elasticsearch_ingestion \ --docs-folder-path docs/LRM \ --output ingestion/chunks.json \ --docs-extension .md .avap \ --es-index avap-docs-test \ --delete-es-index # Index the AVAP code samples python -m scripts.pipelines.flows.elasticsearch_ingestion \ --docs-folder-path docs/samples \ --output ingestion/chunks.json \ --docs-extension .avap \ --es-index avap-docs-test ``` **How it works:** ``` docs/**/*.md + docs/**/*.avap │ ▼ FileFetcher (Chonkie) │ ├─ .md → MarkdownChef → merge code blocks + tables into chunks │ ↓ │ TokenChunker (HuggingFace tokenizer: HF_EMB_MODEL_NAME) │ └─ .avap → TextChef → TokenChunker │ ▼ OllamaEmbeddings.embed_batch() (OLLAMA_EMB_MODEL_NAME) │ ▼ ElasticHandshakeWithMetadata.write() bulk index → {text, embedding, file, start_index, end_index, token_count} │ ▼ export_documents() → ingestion/chunks.json ``` | Chunk field | Source | |---|---| | `text` | Raw chunk text | | `embedding` | Ollama dense vector | | `start_index` / `end_index` | Character offsets in source file | | `token_count` | HuggingFace tokenizer count | | `file` | Source filename | --- ### Pipeline B — AVAP Native (classic, for .avap files with full semantic analysis) A custom lexer-based chunker purpose-built for the AVAP language using `avap_config.json` as its grammar definition. Produces richer metadata (block type, section, semantic tags, complexity score) and includes **MinHash LSH deduplication** and **semantic overlap** between chunks. **Entry point:** `scripts/pipelines/ingestion/avap_chunker.py` **Grammar config:** `scripts/pipelines/ingestion/avap_config.json` — see [`docs/AVAP_CHUNKER_CONFIG.md`](./docs/AVAP_CHUNKER_CONFIG.md) for the full reference on blocks, statements, semantic tags, and how to extend the grammar. ```bash python scripts/pipelines/ingestion/avap_chunker.py \ --lang-config scripts/pipelines/ingestion/avap_config.json \ --docs-path docs/samples \ --output scripts/pipelines/ingestion/ingestion/chunks.jsonl \ --workers 4 ``` **Step 2 — Ingest:** `scripts/pipelines/ingestion/avap_ingestor.py` ```bash # Ingest from existing JSONL python scripts/pipelines/ingestion/avap_ingestor.py \ --chunks scripts/pipelines/ingestion/ingestion/chunks.jsonl \ --index avap-knowledge-v1 \ --delete # Check model embedding dimensions first python scripts/pipelines/ingestion/avap_ingestor.py --probe-dim ``` **How it works:** ``` docs/**/*.avap + docs/**/*.md │ ▼ avap_chunker.py (GenericLexer + LanguageConfig) │ ├─ .avap: block detection (function/if/startLoop/try), statement classification │ │ semantic tags enrichment, function signature extraction │ │ semantic overlap injection (OVERLAP_LINES=3) │ └─ .md: H1/H2/H3 sectioning, fenced code extraction, table isolation, │ narrative split by token budget (MAX_NARRATIVE_TOKENS=400) │ ├─ MinHash LSH deduplication (threshold=0.85, 128 permutations) │ └─ parallel workers (ProcessPoolExecutor) │ ▼ chunks.jsonl (one JSON per line) │ ▼ avap_ingestor.py (async producer/consumer) │ ├─ OllamaAsyncEmbedder — batch embed (BATCH_SIZE_EMBED=8) │ ├─ asyncio.Queue (backpressure, QUEUE_MAXSIZE=5) │ ├─ ES async_bulk (BATCH_SIZE_ES=50) │ └─ DeadLetterQueue — failed chunks saved to failed_chunks_.jsonl │ ▼ Elasticsearch index {chunk_id, content, embedding, doc_type, block_type, section, source_file, start_line, end_line, token_estimate, metadata{...}} ``` **Chunk types produced:** | `doc_type` | `block_type` | Description | |---|---|---| | `code` | `function` | Complete AVAP function block | | `code` | `if` / `startLoop` / `try` | Control flow blocks | | `function_signature` | `function_signature` | Extracted function signature only (for fast lookup) | | `code` | `registerEndpoint` / `addVar` / … | Statement-level chunks by AVAP command category | | `spec` | `narrative` | Markdown prose sections | | `code_example` | language tag | Fenced code blocks from markdown | | `bnf` | `bnf` | BNF grammar blocks from markdown | | `spec` | `table` | Markdown tables | **Semantic tags** (automatically detected, stored in `metadata`): `uses_orm` · `uses_http` · `uses_connector` · `uses_async` · `uses_crypto` · `uses_auth` · `uses_error_handling` · `uses_loop` · `uses_json` · `uses_list` · `uses_regex` · `uses_datetime` · `returns_result` · `registers_endpoint` **Ingestor environment variables:** | Variable | Default | Description | |---|---|---| | `OLLAMA_URL` | `http://localhost:11434` | Ollama base URL for embeddings | | `OLLAMA_MODEL` | `qwen3-0.6B-emb:latest` | Embedding model name | | `OLLAMA_EMBEDDING_DIM` | `1024` | Expected embedding dimension (must match model) | --- ## Development Setup ### 1. Prerequisites * **Docker & Docker Compose** * **gRPCurl** (`brew install grpcurl`) * **Access Credentials:** Ensure the file `./ivar.yaml` (Kubeconfig) is present in the root directory. ### 2. Observability Setup (Langfuse) The engine utilizes Langfuse for end-to-end tracing and performance monitoring. 1. Access the Dashboard: **http://45.77.119.180** 2. Create a project and generate API Keys in **Settings**. 3. Configure your local `.env` file using the reference table below. ### 3. Environment Variables Reference > **Policy:** Every environment variable used by the engine must be documented in this table. Any PR that introduces a new variable without a corresponding entry here will be rejected. See [CONTRIBUTING.md](./CONTRIBUTING.md#5-environment-variables-policy) for full details. Create a `.env` file in the project root with the following variables: ```env PYTHONPATH=${PYTHONPATH}:/home/... ELASTICSEARCH_URL=http://host.docker.internal:9200 ELASTICSEARCH_LOCAL_URL=http://localhost:9200 ELASTICSEARCH_INDEX=avap-docs-test ELASTICSEARCH_USER=elastic ELASTICSEARCH_PASSWORD=changeme ELASTICSEARCH_API_KEY= POSTGRES_URL=postgresql://postgres:postgres@localhost:5432/langfuse LANGFUSE_HOST=http://45.77.119.180 LANGFUSE_PUBLIC_KEY=pk-lf-... LANGFUSE_SECRET_KEY=sk-lf-... OLLAMA_URL=http://host.docker.internal:11434 OLLAMA_LOCAL_URL=http://localhost:11434 OLLAMA_MODEL_NAME=qwen2.5:1.5b OLLAMA_EMB_MODEL_NAME=qwen3-0.6B-emb:latest HF_TOKEN=hf_... HF_EMB_MODEL_NAME=Qwen/Qwen3-Embedding-0.6B ANTHROPIC_API_KEY=sk-ant-... ANTHROPIC_MODEL=claude-sonnet-4-20250514 ``` | Variable | Required | Description | Example | |---|---|---|---| | `PYTHONPATH` | No | Path that aims to the root of the project | `${PYTHONPATH}:/home/...` | | `ELASTICSEARCH_URL` | Yes | Elasticsearch endpoint used for vector/context retrieval in Docker | `http://host.docker.internal:9200` | | `ELASTICSEARCH_LOCAL_URL` | Yes | Elasticsearch endpoint used for vector/context retrieval in local | `http://localhost:9200` | | `ELASTICSEARCH_INDEX` | Yes | Elasticsearch index name used by the engine | `avap-docs-test` | | `ELASTICSEARCH_USER` | No | Elasticsearch username (used when API key is not set) | `elastic` | | `ELASTICSEARCH_PASSWORD` | No | Elasticsearch password (used when API key is not set) | `changeme` | | `ELASTICSEARCH_API_KEY` | No | Elasticsearch API key (takes precedence over user/password auth) | `abc123...` | | `POSTGRES_URL` | Yes | PostgreSQL connection string used by the service | `postgresql://postgres:postgres@localhost:5432/langfuse` | | `LANGFUSE_HOST` | Yes | Langfuse server endpoint (Devaron Cluster) | `http://45.77.119.180` | | `LANGFUSE_PUBLIC_KEY` | Yes | Langfuse project public key for tracing and observability | `pk-lf-...` | | `LANGFUSE_SECRET_KEY` | Yes | Langfuse project secret key | `sk-lf-...` | | `OLLAMA_URL` | Yes | Ollama endpoint used for text generation/embeddings in Docker | `http://host.docker.internal:11434` | | `OLLAMA_LOCAL_URL` | Yes | Ollama endpoint used for text generation/embeddings in local | `http://localhost:11434` | | `OLLAMA_MODEL_NAME` | Yes | Ollama model name for generation | `qwen2.5:1.5b` | | `OLLAMA_EMB_MODEL_NAME` | Yes | Ollama embeddings model name | `qwen3-0.6B-emb:latest` | | `HF_TOKEN` | Yes | HuggingFace secret token | `hf_...` | | `HF_EMB_MODEL_NAME` | Yes | HuggingFace embeddings model name | `Qwen/Qwen3-Embedding-0.6B` | | `ANTHROPIC_API_KEY` | Yes* | Anthropic API key — required for the `EvaluateRAG` endpoint | `sk-ant-...` | | `ANTHROPIC_MODEL` | No | Claude model used by the RAG evaluation suite | `claude-sonnet-4-20250514` | > Never commit real secret values. Use placeholder values when sharing configuration examples. ### 4. Infrastructure Tunnels Open a terminal and establish the connection to the Devaron Cluster: ```bash # 1. AI Model Tunnel (Ollama) kubectl port-forward --address 0.0.0.0 svc/ollama-light-service 11434:11434 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml & # 2. Knowledge Base Tunnel (Elasticsearch) kubectl port-forward --address 0.0.0.0 svc/brunix-vector-db 9200:9200 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml & # 3. Observability DB Tunnel (PostgreSQL) kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml & ``` ### 5. Launch the Engine ```bash docker-compose up -d --build ``` --- ## Testing & Debugging The gRPC service is exposed on port `50052` with **gRPC Reflection** enabled — introspect it at any time without needing the `.proto` file. ```bash # List available services grpcurl -plaintext localhost:50052 list # Describe the full service contract grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine ``` ### `AskAgent` — complete response (non-streaming) Returns the full answer as a single message with `is_final: true`. Suitable for clients that do not support streaming. ```bash grpcurl -plaintext \ -d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \ localhost:50052 \ brunix.AssistanceEngine/AskAgent ``` Expected response: ```json { "text": "addVar is an AVAP command used to declare a variable...", "avap_code": "AVAP-2026", "is_final": true } ``` ### `AskAgentStream` — real token streaming Emits one `AgentResponse` per token from Ollama. The final message has `is_final: true` and empty `text` — it is a termination signal, not part of the answer. ```bash grpcurl -plaintext \ -d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \ localhost:50052 \ brunix.AssistanceEngine/AskAgentStream ``` Expected response stream: ```json {"text": "Here", "is_final": false} {"text": " is", "is_final": false} ... {"text": "", "is_final": true} ``` **Multi-turn conversation:** send subsequent requests with the same `session_id` to maintain context. ```bash # Turn 1 grpcurl -plaintext \ -d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \ localhost:50052 brunix.AssistanceEngine/AskAgentStream # Turn 2 — engine has Turn 1 history grpcurl -plaintext \ -d '{"query": "Show me a code example", "session_id": "user-abc"}' \ localhost:50052 brunix.AssistanceEngine/AskAgentStream ``` ### `EvaluateRAG` — quality evaluation Runs the RAGAS evaluation pipeline against the golden dataset using Claude as the judge. Requires `ANTHROPIC_API_KEY` to be set. ```bash # Full evaluation grpcurl -plaintext -d '{}' localhost:50052 brunix.AssistanceEngine/EvaluateRAG # Filtered: first 10 questions of category "core_syntax" grpcurl -plaintext \ -d '{"category": "core_syntax", "limit": 10, "index": "avap-docs-test"}' \ localhost:50052 \ brunix.AssistanceEngine/EvaluateRAG ``` Expected response: ```json { "status": "ok", "questions_evaluated": 10, "elapsed_seconds": 142.3, "judge_model": "claude-sonnet-4-20250514", "faithfulness": 0.8421, "answer_relevancy": 0.7913, "context_recall": 0.7234, "context_precision": 0.6891, "global_score": 0.7615, "verdict": "ACCEPTABLE" } ``` Verdict thresholds: `EXCELLENT` ≥ 0.80 · `ACCEPTABLE` ≥ 0.60 · `INSUFFICIENT` < 0.60 --- ## HTTP Proxy (OpenAI & Ollama Compatible) The container also runs an **OpenAI-compatible HTTP proxy** on port `8000` (`openai_proxy.py`). It wraps the gRPC engine transparently — `stream: false` routes to `AskAgent`, `stream: true` routes to `AskAgentStream`. This enables integration with any tool that supports the OpenAI or Ollama API (continue.dev, LiteLLM, Open WebUI, etc.) without code changes. ### OpenAI endpoints | Method | Endpoint | Description | |---|---|---| | `GET` | `/v1/models` | List available models | | `POST` | `/v1/chat/completions` | Chat completion — streaming and non-streaming | | `POST` | `/v1/completions` | Legacy text completion — streaming and non-streaming | | `GET` | `/health` | Health check — returns gRPC target and status | **Non-streaming chat:** ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "brunix", "messages": [{"role": "user", "content": "What is AVAP?"}], "stream": false }' ``` **Streaming chat (SSE):** ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "brunix", "messages": [{"role": "user", "content": "Write an AVAP hello world API"}], "stream": true, "session_id": "user-xyz" }' ``` > **Brunix extension:** `session_id` is a non-standard field added to the OpenAI schema. Use it to maintain multi-turn conversation context across HTTP requests. If omitted, all requests share the `"default"` session. ### Ollama endpoints | Method | Endpoint | Description | |---|---|---| | `GET` | `/api/tags` | List models (Ollama format) | | `POST` | `/api/chat` | Chat — NDJSON stream, `stream: true` by default | | `POST` | `/api/generate` | Text generation — NDJSON stream, `stream: true` by default | ```bash curl http://localhost:8000/api/chat \ -H "Content-Type: application/json" \ -d '{ "model": "brunix", "messages": [{"role": "user", "content": "Explain AVAP loops"}], "stream": true }' ``` ### Proxy environment variables | Variable | Default | Description | |---|---|---| | `BRUNIX_GRPC_TARGET` | `localhost:50051` | gRPC engine address the proxy connects to | | `PROXY_MODEL_ID` | `brunix` | Model name returned in API responses | | `PROXY_THREAD_WORKERS` | `20` | Thread pool size for concurrent gRPC calls | --- ## API Contract (Protobuf) The source of truth for the gRPC interface is `Docker/protos/brunix.proto`. After modifying it, regenerate the stubs: ```bash python -m grpc_tools.protoc \ -I./Docker/protos \ --python_out=./Docker/src \ --grpc_python_out=./Docker/src \ ./Docker/protos/brunix.proto ``` For the full API reference — message types, field descriptions, error handling, and all client examples — see [`docs/API_REFERENCE.md`](./docs/API_REFERENCE.md). --- ## Dataset Generation & Evaluation The engine includes a specialized benchmarking suite to evaluate the model's proficiency in **AVAP syntax**. This is achieved through a synthetic data generator that creates problems in the MBPP (Mostly Basic Python Problems) style, but tailored for the AVAP Language Reference Manual (LRM). ### 1. Synthetic Data Generator The script `scripts/pipelines/flows/generate_mbap.py` leverages Claude to produce high-quality, executable code examples and validation tests. **Key Features:** * **LRM Grounding:** Uses the provided `avap.md` as the source of truth for syntax and logic. * **Validation Logic:** Generates `test_list` with Python regex assertions to verify the state of the AVAP stack after execution. * **Balanced Categories:** Covers 14 domains including ORM, Concurrency (`go/gather`), HTTP handling, and Cryptography. ### 2. Usage Ensure you have the `anthropic` library installed and your API key configured: ```bash pip install anthropic export ANTHROPIC_API_KEY="your-sk-ant-key" ``` Run the generator specifying the path to your LRM and the desired output: ```bash python scripts/pipelines/flows/generate_mbap.py \ --lrm docs/LRM/avap.md \ --output evaluation/mbpp_avap.json \ --problems 300 ``` ### 3. Dataset Schema The generated JSON follows this structure: | Field | Type | Description | | :--- | :--- | :--- | | `task_id` | Integer | Unique identifier for the benchmark. | | `text` | String | Natural language description of the problem (Spanish). | | `code` | String | The reference AVAP implementation. | | `test_list` | Array | Python `re.match` expressions to validate execution results. | ### 4. Integration in RAG These generated examples are used to: 1. **Fine-tune** the local models (`qwen2.5:1.5b`) or others via the MrHouston pipeline. 2. **Evaluate** the "Zero-Shot" performance of the engine before deployment. 3. **Provide Few-Shot examples** in the RAG prompt orchestration (`src/prompts.py`). --- ## Repository Standards & Architecture ### Docker & Build Context To maintain production-grade security and image efficiency, this project enforces a strict separation between development files and the production runtime: * **Production Root:** All executable code must reside in the `/app` directory within the container. * **Exclusions:** The root `/workspace` directory is deprecated. No development artifacts, local logs, or non-essential source files (e.g., `.git`, `tests/`, `docs/`) should be bundled into the final image. * **Compliance:** All Pull Requests must verify that the `Dockerfile` context is optimized using the provided `.dockerignore`. *Failure to comply with these architectural standards will result in PR rejection.* For the full set of contribution standards, see [CONTRIBUTING.md](./CONTRIBUTING.md). --- ## Documentation Index | Document | Purpose | |---|---| | [README.md](./README.md) | Setup guide, env vars reference, quick start (this file) | | [CONTRIBUTING.md](./CONTRIBUTING.md) | Contribution standards, GitFlow, PR process | | [SECURITY.md](./SECURITY.md) | Security policy, vulnerability reporting, known limitations | | [docs/ARCHITECTURE.md](./docs/ARCHITECTURE.md) | Deep technical architecture, component inventory, data flows | | [docs/API_REFERENCE.md](./docs/API_REFERENCE.md) | Complete gRPC API contract, message types, client examples | | [docs/RUNBOOK.md](./docs/RUNBOOK.md) | Operational playbooks, health checks, incident response | | [docs/AVAP_CHUNKER_CONFIG.md](./docs/AVAP_CHUNKER_CONFIG.md) | `avap_config.json` reference — blocks, statements, semantic tags, how to extend | | [docs/adr/](./docs/adr/) | Architecture Decision Records | --- ## Security & Intellectual Property * **Data Privacy:** All LLM processing and vector searches are conducted within a private Kubernetes environment. * **Proprietary Technology:** This repository contains the **AVAP Technology** stack (101OBEX) and specialized training logic (MrHouston). Unauthorized distribution is prohibited. ---