assistance-engine/README.md

# Brunix Assistance Engine

The **Brunix Assistance Engine** is a high-performance, gRPC-powered AI orchestration service. It serves as the core intelligence layer for the Brunix ecosystem, integrating advanced RAG (Retrieval-Augmented Generation) capabilities with real-time observability.

This project is a strategic joint development:
* **[101OBEX Corp](https://101obex.com):** Infrastructure, System Architecture, and the proprietary **AVAP Technology** stack.
* **[MrHouston](https://mrhouston.net):** Advanced LLM Fine-tuning, Model Training, and Prompt Engineering.

---

## System Architecture (Hybrid Dev Mode)

The engine runs locally for development but connects to the production-grade infrastructure in the **Vultr Cloud (Devaron Cluster)** via secure `kubectl` tunnels.

```mermaid
graph TD
    subgraph Local_Workstation [Developer]
        BE[Brunix Assistance Engine - Docker]
        KT[Kubectl Port-Forward Tunnels]
    end

    subgraph Vultr_K8s_Cluster [Production - Devaron Cluster]
        OL[Ollama Light Service - LLM]
        EDB[(Elasticsearch Vector DB)]
        PG[(Postgres - Langfuse Data)]
        LF[Langfuse UI - Web]
    end

    BE -- localhost:11434 --> KT
    BE -- localhost:9200 --> KT
    BE -- localhost:5432 --> KT

    KT -- Secure Link --> OL
    KT -- Secure Link --> EDB
    KT -- Secure Link --> PG

    Developer -- Browser --> LF
```

---

## Project Structure

```text

├── README.md                     # System documentation & Dev guide
├── changelog                     # Version tracking and release history
├── pyproject.toml                # Python project configuration
├── docs/
│   ├── AVAP Language: ...        # AVAP DSL Documentation
│   │   └── AVAP.md
│   ├── developer.avapfr...       # Documents on developer web page
│   ├── LRM/                      # AVAP LRM documentation
│   │   └── avap.md
│   └── samples/                  # AVAP code samples
├── Docker/
│   ├── protos/
│   │    └── brunix.proto         # Protocol Buffers: The source of truth for the API
│   ├── src/
│   │    ├── graph.py             # Workflow graph orchestration
│   │    ├── prompts.py           # Centralized prompt definitions
│   │    ├── server.py            # gRPC Server & RAG Orchestration
│   │    ├── state.py             # Shared state management
│   │    └── utils/               # Utility modules
│   ├── Dockerfile                # Container definition for the Engine
│   ├── docker-compose.yaml       # Local orchestration for dev environment
│   ├── requirements.txt          # Python dependencies for Docker
│   └── .dockerignore             # Docker ignore files
├── scripts/
│   └── pipelines/
│       ├── flows/                # Processing pipelines
│       └── tasks/                # Modules used by the flows
└── src/
    ├── config.py                 # Environment variables configuration file
    └── utils/
        ├── emb_factory.py        # Embedding model factory
        └── llm_factory.py        # LLM model factory
```

---

## Data Flow & RAG Orchestration

The following diagram illustrates the sequence of a single `AskAgent` request, detailing the retrieval and generation phases through the secure tunnel.

```mermaid
sequenceDiagram
    participant U as External Client (gRPCurl/App)
    participant E as Brunix Engine (Local Docker)
    participant T as Kubectl Tunnel
    participant V as Vector DB (Vultr)
    participant O as Ollama Light (Vultr)

    U->>E: AskAgent(query, session_id)
    Note over E: Start Langfuse Trace

    E->>T: Search Context (Embeddings)
    T->>V: Query Index [avap_manuals]
    V-->>T: Return Relevant Chunks
    T-->>E: Contextual Data

    E->>T: Generate Completion (Prompt + Context)
    T->>O: Stream Tokens (qwen2.5:1.5b)

    loop Token Streaming
        O-->>T: Token
        T-->>E: Token
        E-->>U: gRPC Stream Response {text, avap_code}
    end

    Note over E: Close Langfuse Trace
```

---

## Development Setup

### 1. Prerequisites
* **Docker & Docker Compose**
* **gRPCurl** (`brew install grpcurl`)
* **Access Credentials:** Ensure the file `./ivar.yaml` (Kubeconfig) is present in the root directory.

### 2. Observability Setup (Langfuse)
The engine utilizes Langfuse for end-to-end tracing and performance monitoring.
1.  Access the Dashboard: **http://45.77.119.180**
2.  Create a project and generate API Keys in **Settings**.
3.  Configure your local `.env` file using the reference table below.

### 3. Environment Variables Reference

> **Policy:** Every environment variable used by the engine must be documented in this table. Any PR that introduces a new variable without a corresponding entry here will be rejected. See [CONTRIBUTING.md](./CONTRIBUTING.md#5-environment-variables-policy) for full details.

Create a `.env` file in the project root with the following variables:

```env
PYTHONPATH=${PYTHONPATH}:/home/...
ELASTICSEARCH_URL=http://host.docker.internal:9200
ELASTICSEARCH_LOCAL_URL=http://localhost:9200
ELASTICSEARCH_INDEX=avap-docs-test
POSTGRES_URL=postgresql://postgres:postgres@localhost:5432/langfuse
LANGFUSE_HOST=http://45.77.119.180
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
OLLAMA_URL=http://host.docker.internal:11434
OLLAMA_LOCAL_URL=http://localhost:11434
OLLAMA_MODEL_NAME=qwen2.5:1.5b
OLLAMA_EMB_MODEL_NAME=qwen3-0.6B-emb:latest
HF_TOKEN=hf_...
HF_EMB_MODEL_NAME=Qwen/Qwen3-Embedding-0.6B
```

| Variable | Required | Description | Example |
|---|---|---|---|
| `PYTHONPATH` | No | Path that aims to the root of the project  | `${PYTHONPATH}:/home/...` |
| `ELASTICSEARCH_URL` | Yes | Elasticsearch endpoint used for vector/context retrieval in Docker | `http://host.docker.internal:9200` |
| `ELASTICSEARCH_LOCAL_URL` | Yes | Elasticsearch endpoint used for vector/context retrieval in local | `http://localhost:9200` |
| `ELASTICSEARCH_INDEX` | Yes | Elasticsearch index name used by the engine | `avap-docs-test` |
| `POSTGRES_URL` | Yes | PostgreSQL connection string used by the service | `postgresql://postgres:postgres@localhost:5432/langfuse` |
| `LANGFUSE_HOST` | Yes | Langfuse server endpoint (Devaron Cluster) | `http://45.77.119.180` |
| `LANGFUSE_PUBLIC_KEY` | Yes | Langfuse project public key for tracing and observability | `pk-lf-...` |
| `LANGFUSE_SECRET_KEY` | Yes | Langfuse project secret key | `sk-lf-...` |
| `OLLAMA_URL` | Yes | Ollama endpoint used for text generation/embeddings in Docker | `http://host.docker.internal:11434` |
| `OLLAMA_LOCAL_URL` | Yes | Ollama endpoint used for text generation/embeddings in local | `http://localhost:11434` |
| `OLLAMA_MODEL_NAME` | Yes | Ollama model name for generation | `qwen2.5:1.5b` |
| `OLLAMA_EMB_MODEL_NAME` | Yes | Ollama embeddings model name | `qwen3-0.6B-emb:latest` |
| `HF_TOKEN` | Yes | Hugginface secret token | `hf_...` |
| `HF_EMB_MODEL_NAME` | Yes | Hugginface embeddings model name | `Qwen/Qwen3-Embedding-0.6B` |

> Never commit real secret values. Use placeholder values when sharing configuration examples.

### 4. Infrastructure Tunnels
Open a terminal and establish the connection to the Devaron Cluster:

```bash
# 1. AI Model Tunnel (Ollama)
kubectl port-forward --address 0.0.0.0 svc/ollama-light-service 11434:11434 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &

# 2. Knowledge Base Tunnel (Elasticsearch)
kubectl port-forward --address 0.0.0.0 svc/brunix-vector-db 9200:9200 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &

# 3. Observability DB Tunnel (PostgreSQL)
kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &
```

### 5. Launch the Engine
```bash
docker-compose up -d --build
```

---

## Testing & Debugging

The service is exposed on port `50052` with **gRPC Reflection** enabled.

### Streaming Query Example
```bash
grpcurl -plaintext \
  -d '{"query": "Hola Brunix, ¿qué es AVAP?", "session_id": "dev-test-123"}' \
  localhost:50052 \
  brunix.AssistanceEngine/AskAgent
```

---

## API Contract (Protobuf)
To update the communication interface, modify `protos/brunix.proto` and re-generate the stubs:

```bash
python -m grpc_tools.protoc -I./protos --python_out=./src --grpc_python_out=./src ./protos/brunix.proto
```

---

## Dataset Generation & Evaluation

The engine includes a specialized benchmarking suite to evaluate the model's proficiency in **AVAP syntax**. This is achieved through a synthetic data generator that creates problems in the MBPP (Mostly Basic Python Problems) style, but tailored for the AVAP Language Reference Manual (LRM).

### 1. Synthetic Data Generator
The script `scripts/generate_mbpp_avap.py` leverages Claude 3.5 Sonnet to produce high-quality, executable code examples and validation tests.

**Key Features:**
* **LRM Grounding:** Uses the provided `avap.md` as the source of truth for syntax and logic.
* **Validation Logic:** Generates `test_list` with Python regex assertions to verify the state of the AVAP stack after execution.
* **Balanced Categories:** Covers 14 domains including ORM, Concurrency (`go/gather`), HTTP handling, and Cryptography.

### 2. Usage
Ensure you have the `anthropic` library installed and your API key configured:

```bash
pip install anthropic
export ANTHROPIC_API_KEY="your-sk-ant-key"
```

Run the generator specifying the path to your LRM and the desired output:

```bash
python scripts/generate_mbpp_avap.py \
  --lrm ingestion/docs/avap.md \
  --output evaluation/mbpp_avap.json \
  --problems 300
```

### 3. Dataset Schema
The generated JSON follows this structure:

| Field | Type | Description |
| :--- | :--- | :--- |
| `task_id` | Integer | Unique identifier for the benchmark. |
| `text` | String | Natural language description of the problem (Spanish). |
| `code` | String | The reference AVAP implementation. |
| `test_list` | Array | Python `re.match` expressions to validate execution results. |

### 4. Integration in RAG
These generated examples are used to:
1.  **Fine-tune** the local models (`qwen2.5:1.5b`) or others via the MrHouston pipeline.
2.  **Evaluate** the "Zero-Shot" performance of the engine before deployment.
3.  **Provide Few-Shot examples** in the RAG prompt orchestration (`src/prompts.py`).

---

## Repository Standards & Architecture

### Docker & Build Context
To maintain production-grade security and image efficiency, this project enforces a strict separation between development files and the production runtime:

* **Production Root:** All executable code must reside in the `/app` directory within the container.
* **Exclusions:** The root `/workspace` directory is deprecated. No development artifacts, local logs, or non-essential source files (e.g., `.git`, `tests/`, `docs/`) should be bundled into the final image.
* **Compliance:** All Pull Requests must verify that the `Dockerfile` context is optimized using the provided `.dockerignore`.

*Failure to comply with these architectural standards will result in PR rejection.*

For the full set of contribution standards, see [CONTRIBUTING.md](./CONTRIBUTING.md).

---

## Security & Intellectual Property
* **Data Privacy:** All LLM processing and vector searches are conducted within a private Kubernetes environment.
* **Proprietary Technology:** This repository contains the **AVAP Technology** stack (101OBEX) and specialized training logic (MrHouston). Unauthorized distribution is prohibited.

---