diff --git a/.env.example b/.env.example
deleted file mode 100644
index 8266b5b..0000000
--- a/.env.example
+++ /dev/null
@@ -1,6 +0,0 @@
-MODEL_NAME=gpt-4-turbo-preview
-
-OPENAI_API_KEY=sk-xxxx..
-LANGFUSE_PUBLIC_KEY=pk-lf-...
-LANGFUSE_SECRET_KEY=sk-lf-...
-LANGFUSE_HOST=http://brunix-observability:3000
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
new file mode 100644
index 0000000..f03cd48
--- /dev/null
+++ b/.github/pull_request_template.md
@@ -0,0 +1,76 @@
+## Summary
+
+
+
+## Type of change
+
+- [ ] New feature (`Added`)
+- [ ] Change to existing behavior (`Changed`)
+- [ ] Bug fix (`Fixed`)
+- [ ] Security / infrastructure (`Security`)
+- [ ] Internal refactor (no behavioral change)
+- [ ] Docs / changelog only
+
+---
+
+## PR Checklist
+
+> All applicable items must be checked before requesting review.
+> Reviewers are authorized to close and request resubmission of PRs that do not meet these standards.
+
+### Code & Environment
+- [ ] Tested locally against the **authorized Devaron Cluster** (no external or unauthorized infrastructure used)
+- [ ] No personal IDE/environment files committed (`.vscode`, `.devcontainer`, etc.)
+- [ ] No `root` user configurations introduced
+- [ ] `Dockerfile` and `.dockerignore` comply with build context standards (`/app` only, no `/workspace`)
+
+### Ingestion Files
+- [ ] **No ingestion files were added or modified in this PR**
+- [ ] **Ingestion files were added or modified** and are committed to the repository under `ingestion/` or `data/`
+
+### Environment Variables
+- [ ] **No new environment variables were introduced in this PR**
+- [ ] **New variables were introduced** and are fully documented in the `.env` table in `README.md`
+
+If new variables were added, list them here:
+
+| Variable | Required | Description | Example value |
+|---|---|---|---|
+| `VARIABLE_NAME` | Yes / No | What it does | `example` |
+
+### Changelog
+- [ ] **Not required** — internal refactor, typo/comment fix, or zero behavioral impact
+- [ ] **Updated** — entry added to `changelog` with correct version bump and today's date
+
+### Documentation
+- [ ] **Not required** — internal change with no impact on setup, API, or usage
+- [ ] **Updated** — `README.md` or relevant docs reflect this change
+
+---
+
+## Changelog entry
+
+
+
+```
+## [X.Y.Z] - YYYY-MM-DD
+
+### Added / Changed / Fixed / Security
+- LABEL: Description.
+```
+
+---
+
+## Infrastructure status during testing
+
+| Tunnel | Status |
+|---|---|
+| Ollama (port 11434) | `active` / `N/A` |
+| Elasticsearch (port 9200) | `active` / `N/A` |
+| PostgreSQL (port 5432) | `active` / `N/A` |
+
+---
+
+## Notes for reviewer
+
+
diff --git a/.gitignore b/.gitignore
index b7faf40..3b1d3b3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,6 @@
# Byte-compiled / optimized / DLL files
__pycache__/
-*.py[codz]
+*.py[cod]
*$py.class
# C extensions
@@ -46,7 +46,7 @@ htmlcov/
nosetests.xml
coverage.xml
*.cover
-*.py.cover
+*.py,cover
.hypothesis/
.pytest_cache/
cover/
@@ -94,35 +94,20 @@ ipython_config.py
# install all needed dependencies.
#Pipfile.lock
-# UV
-# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
-# This is especially recommended for binary packages to ensure reproducibility, and is more
-# commonly ignored for libraries.
-#uv.lock
-
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
-#poetry.toml
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
-# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
#pdm.lock
-#pdm.toml
-.pdm-python
-.pdm-build/
-
-# pixi
-# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
-#pixi.lock
-# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
-# in the .venv directory. It is recommended not to include this directory in version control.
-.pixi
+# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+# in version control.
+# https://pdm.fming.dev/#use-with-ide
+.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
@@ -136,7 +121,6 @@ celerybeat.pid
# Environments
.env
-.envrc
.venv
env/
venv/
@@ -173,35 +157,99 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
+# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
+# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
+.idea
+# User-specific stuff
+.idea/**/workspace.xml
+.idea/**/tasks.xml
+.idea/**/usage.statistics.xml
+.idea/**/dictionaries
+.idea/**/shelf
-# Abstra
-# Abstra is an AI-powered process automation framework.
-# Ignore directories containing user credentials, local state, and settings.
-# Learn more at https://abstra.io/docs
-.abstra/
+# AWS User-specific
+.idea/**/aws.xml
-# Visual Studio Code
-# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
-# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
-# and can be added to the global gitignore or merged into this file. However, if you prefer,
-# you could uncomment the following to ignore the entire vscode folder
-# .vscode/
+# Generated files
+.idea/**/contentModel.xml
-# Ruff stuff:
-.ruff_cache/
+# Sensitive or high-churn files
+.idea/**/dataSources/
+.idea/**/dataSources.ids
+.idea/**/dataSources.local.xml
+.idea/**/sqlDataSources.xml
+.idea/**/dynamic.xml
+.idea/**/uiDesigner.xml
+.idea/**/dbnavigator.xml
-# PyPI configuration file
-.pypirc
+# Gradle
+.idea/**/gradle.xml
+.idea/**/libraries
-# Cursor
-# Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
-# exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
-# refer to https://docs.cursor.com/context/ignore-files
-.cursorignore
-.cursorindexingignore
+# Gradle and Maven with auto-import
+# When using Gradle or Maven with auto-import, you should exclude module files,
+# since they will be recreated, and may cause churn. Uncomment if using
+# auto-import.
+# .idea/artifacts
+# .idea/compiler.xml
+# .idea/jarRepositories.xml
+# .idea/modules.xml
+# .idea/*.iml
+# .idea/modules
+# *.iml
+# *.ipr
-# Marimo
-marimo/_static/
-marimo/_lsp/
-__marimo__/
+# CMake
+cmake-build-*/
+
+# Mongo Explorer plugin
+.idea/**/mongoSettings.xml
+
+# File-based project format
+*.iws
+
+# IntelliJ
+out/
+
+# mpeltonen/sbt-idea plugin
+.idea_modules/
+
+# JIRA plugin
+atlassian-ide-plugin.xml
+
+# Cursive Clojure plugin
+.idea/replstate.xml
+
+# SonarLint plugin
+.idea/sonarlint/
+
+# Crashlytics plugin (for Android Studio and IntelliJ)
+com_crashlytics_export_strings.xml
+crashlytics.properties
+crashlytics-build.properties
+fabric.properties
+
+# Editor-based Rest Client
+.idea/httpRequests
+
+# Kubernetes
+/kubernetes
+
+
+#documentation
+/documentation
+
+# Android studio 3.1+ serialized cache file
+.idea/caches/build_file_checksums.ser
+/kubernetes
+/data
+/data_tmp
+#logging.yml
+
+.python-version
+src/mrh_saltoki_common/py.typed
+
+*.history
+
+.devcontainer
+.vscode
\ No newline at end of file
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..03f60c8
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,308 @@
+# Contributing to Brunix Assistance Engine
+
+> This document is the single source of truth for all contribution standards in the Brunix Assistance Engine repository. All contributors — regardless of seniority or role — are expected to read, understand, and comply with these guidelines before opening any Pull Request.
+
+---
+
+## Table of Contents
+
+1. [Development Workflow (GitFlow)](#1-development-workflow-gitflow)
+2. [Infrastructure Standards](#2-infrastructure-standards)
+3. [Repository Standards](#3-repository-standards)
+4. [Pull Request Requirements](#4-pull-request-requirements)
+5. [Ingestion Files Policy](#5-ingestion-files-policy)
+6. [Environment Variables Policy](#6-environment-variables-policy)
+7. [Changelog Policy](#7-changelog-policy)
+8. [Documentation Policy](#8-documentation-policy)
+9. [Architecture Decision Records (ADRs)](#9-architecture-decision-records-adrs)
+10. [Incident & Blockage Reporting](#10-incident--blockage-reporting)
+
+---
+
+## 1. Development Workflow (GitFlow)
+
+### Branch Strategy
+
+| Branch type | Naming convention | Purpose |
+|---|---|---|
+| Feature | `*-dev` | Active development — volatile, no CI validation |
+| Main | `online` | Production-ready, fully validated |
+
+- **Feature branches** (`*-dev`) are volatile environments. No validation tests or infrastructure deployments are performed on these branches.
+- **Official validation** only occurs after a documented Pull Request is merged into `online`.
+- **Developer responsibility:** Code must be stable and functional against the authorized environment before a PR is opened. Do not use the PR review process as a debugging step.
+
+---
+
+## 2. Infrastructure Standards
+
+The project provides a validated, shared environment (Devaron Cluster, Vultr) including Ollama, Elasticsearch, and PostgreSQL.
+
+- **Authorized environment only.** The use of parallel, unauthorized infrastructures — external EC2 instances, ad-hoc local setups, non-replicable environments — is strictly prohibited for official development.
+- **No siloed environments.** Isolated development creates technical debt and incompatibility risks that directly impact delivery timelines.
+- All infrastructure access must be established via the documented `kubectl` port-forward tunnels defined in the [README](./README.md#3-infrastructure-tunnels).
+
+---
+
+## 3. Repository Standards
+
+### IDE Agnosticism
+
+The `online` branch must remain neutral to any individual's development environment. The following **must not** be committed under any circumstance:
+
+- `.devcontainer/`
+- `.vscode/`
+- Any local IDE or editor configuration files
+
+The `.gitignore` automates exclusion of these artifacts. Ensure your local environment is fully decoupled from the production-ready source code.
+
+### Security & Least Privilege
+
+- Never use `root` as `remoteUser` in any shared dev environment configuration.
+- All configurations must comply with the **Principle of Least Privilege**.
+- Using root in shared environments introduces unacceptable supply chain risk.
+
+### Docker & Build Context
+
+- All executable code must reside in `/app` within the container.
+- The `/workspace` root directory is **deprecated** — do not reference it.
+- Every PR must verify the `Dockerfile` context is optimized via `.dockerignore`.
+
+> **PRs that violate these architectural standards will be rejected without review.**
+
+---
+
+## 4. Pull Request Requirements
+
+A PR is not ready for review unless **all applicable items** in the following checklist are satisfied. Reviewers are authorized to close PRs that do not meet these standards and request resubmission.
+
+### PR Checklist
+
+**Code & Environment**
+- [ ] Tested locally against the authorized Devaron Cluster (no unauthorized infrastructure used)
+- [ ] No IDE or environment configuration files committed (`.vscode`, `.devcontainer`, etc.)
+- [ ] No `root` user configurations introduced
+- [ ] `Dockerfile` and `.dockerignore` comply with build context standards
+
+**Ingestion Files** *(see [Section 5](#5-ingestion-files-policy))*
+- [ ] No ingestion files were added or modified
+- [ ] New or modified ingestion files are committed to the repository under `ingestion/` or `data/`
+
+**Environment Variables** *(see [Section 6](#6-environment-variables-policy))*
+- [ ] No new environment variables were introduced
+- [ ] New environment variables are documented in the `.env` reference table in `README.md`
+
+**Changelog** *(see [Section 6](#6-changelog-policy))*
+- [ ] No changelog entry required (internal refactor, comment/typo fix, zero behavioral change)
+- [ ] Changelog updated with correct version bump and date
+
+**Documentation** *(see [Section 8](#8-documentation-policy))*
+- [ ] No documentation update required (internal change, no impact on setup or API)
+- [ ] `README.md` or relevant docs updated to reflect this change
+- [ ] If a significant architectural decision was made, an ADR was created in `docs/adr/`
+
+---
+
+## 5. Ingestion Files Policy
+
+All files used to populate the vector knowledge base — source documents, AVAP manuals, structured data, or ingestion scripts — **must be committed to the repository.**
+
+### Rules
+
+- Ingestion files must reside in a dedicated directory (e.g., `ingestion/` or `data/`) within the repository.
+- Any PR that introduces new knowledge base content or modifies existing ingestion pipelines must include the corresponding source files.
+- Files containing sensitive content that cannot be committed in plain form must be flagged for discussion before proceeding. Encryption, redaction, or a separate private submodule are all valid solutions — committing to an external or local-only location is not.
+
+### Why this matters
+
+The Elasticsearch vector index is only as reliable as the source material that feeds it. Ingestion files that exist only on a local machine or external location cannot be audited, rebuilt, or validated by the team. A knowledge base populated from untracked files is a non-reproducible dependency — and a risk to the entire RAG pipeline.
+
+---
+
+## 6. Environment Variables Policy
+
+This is a critical requirement. **Every environment variable introduced in a PR must be documented before the PR can be merged.**
+
+### Rules
+
+- Any new variable added to the codebase (`.env`, `docker-compose.yaml`, `server.py`, or any config file) must be declared in the `.env` reference table in `README.md`.
+- The documentation must include: variable name, purpose, whether it is required or optional, and an example value.
+- Variables that contain secrets must use placeholder values (e.g., `your-secret-key-here`) — never commit real values.
+
+### Required format in README.md
+
+```markdown
+| Variable | Required | Description | Example |
+|---|---|---|---|
+| `LANGFUSE_PUBLIC_KEY` | Yes | Langfuse project public key for tracing | `pk-lf-...` |
+| `LANGFUSE_SECRET_KEY` | Yes | Langfuse project secret key | `sk-lf-...` |
+| `LANGFUSE_HOST` | Yes | Langfuse server endpoint | `http://45.77.119.180` |
+| `NEW_VARIABLE` | Yes | Description of what it does | `example-value` |
+```
+
+### Why this matters
+
+An undocumented environment variable silently breaks the setup for every other developer on the team. It also makes the service non-reproducible, which is a direct violation of the infrastructure standards in Section 2. There are no exceptions to this policy.
+
+---
+
+## 7. Changelog Policy
+
+The `changelog` file tracks all notable changes and follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+### When a changelog entry IS required
+
+| Change type | Label to use |
+|---|---|
+| New feature or capability | `Added` |
+| Change to existing behavior, API, or interface | `Changed` |
+| Bug fix | `Fixed` |
+| Security patch or security-related change | `Security` |
+| Breaking change or deprecation | `Deprecated` / `Removed` |
+
+### When a changelog entry is NOT required
+
+- Typo or comment fixes only
+- Internal refactors with zero behavioral or interface change
+- Tooling/CI updates with no user-visible impact
+
+**If in doubt, add an entry.**
+
+### Format
+
+New entries go at the top of the file, above the previous version:
+
+```
+## [X.Y.Z] - YYYY-MM-DD
+
+### Added
+- LABEL: Description of the new feature or capability.
+
+### Changed
+- LABEL: Description of what changed and the rationale.
+
+### Fixed
+- LABEL: Description of the bug resolved.
+```
+
+Use uppercase short labels for scanability: `API:`, `DOCKER:`, `INFRA:`, `SECURITY:`, `ENV:`, `CONFIG:`.
+
+---
+
+## 8. Documentation Policy
+
+### When documentation MUST be updated
+
+Update `README.md` (or the relevant doc file) if the PR includes any of the following:
+
+- Changes to project structure (new files, directories, removed components)
+- Changes to setup, installation, or environment configuration
+- New or modified API endpoints or Protobuf definitions (`brunix.proto`)
+- New, modified, or removed environment variables
+- Changes to infrastructure tunnels or Kubernetes service names
+- New dependencies or updated dependency versions
+- Changes to security, access, or repository standards
+
+### When documentation is NOT required
+
+- Internal implementation changes with no impact on setup, usage, or API
+- Fixes that do not alter any documented behavior
+
+### Documentation files in this repository
+
+| File | Purpose |
+|---|---|
+| `README.md` | Setup guide, env vars reference, quick start |
+| `CONTRIBUTING.md` | Contribution standards (this file) |
+| `SECURITY.md` | Security policy and vulnerability reporting |
+| `docs/ARCHITECTURE.md` | Deep technical architecture reference |
+| `docs/API_REFERENCE.md` | Complete gRPC API contract and examples |
+| `docs/RUNBOOK.md` | Operational playbooks and incident response |
+| `docs/AVAP_CHUNKER_CONFIG.md` | `avap_config.json` reference — blocks, statements, semantic tags |
+| `docs/adr/` | Architecture Decision Records |
+
+> **PRs that change user-facing behavior or setup without updating documentation will be rejected.**
+
+---
+
+## 9. Architecture Decision Records (ADRs)
+
+Architecture Decision Records document **significant technical decisions** — choices that have lasting consequences on the codebase, infrastructure, or development process.
+
+### When to write an ADR
+
+Write an ADR when a PR introduces or changes:
+
+- A fundamental technology choice (communication protocol, storage backend, framework)
+- A design pattern that other components will follow
+- A deliberate trade-off with known consequences
+- A decision that future engineers might otherwise reverse without understanding the rationale
+
+### When NOT to write an ADR
+
+- Implementation details within a single module
+- Bug fixes
+- Dependency version bumps
+- Configuration changes
+
+### ADR format
+
+ADRs live in `docs/adr/` and follow this naming convention:
+
+```
+ADR-XXXX-short-title.md
+```
+
+Where `XXXX` is a zero-padded sequential number (e.g., `ADR-0005-new-decision.md`).
+
+Each ADR must contain:
+
+```markdown
+# ADR-XXXX: Title
+
+**Date:** YYYY-MM-DD
+**Status:** Proposed | Accepted | Deprecated | Superseded by ADR-YYYY
+**Deciders:** Names or roles
+
+## Context
+What problem are we solving? What forces are at play?
+
+## Decision
+What did we decide?
+
+## Rationale
+Why this option over alternatives? Include a trade-off analysis.
+
+## Consequences
+What are the positive and negative results of this decision?
+```
+
+### Existing ADRs
+
+| ADR | Title | Status |
+|---|---|---|
+| [ADR-0001](docs/adr/ADR-0001-grpc-primary-interface.md) | gRPC as the Primary Communication Interface | Accepted |
+| [ADR-0002](docs/adr/ADR-0002-two-phase-streaming.md) | Two-Phase Streaming Design for AskAgentStream | Accepted |
+| [ADR-0003](docs/adr/ADR-0003-hybrid-retrieval-rrf.md) | Hybrid Retrieval (BM25 + kNN) with RRF Fusion | Accepted |
+| [ADR-0004](docs/adr/ADR-0004-claude-eval-judge.md) | Claude as the RAGAS Evaluation Judge | Accepted |
+
+---
+
+## 10. Incident & Blockage Reporting
+
+If you encounter a technical blockage (connection timeouts, service downtime, tunnel failures):
+
+1. **Immediate notification** — Report via the designated Slack channel at the moment of detection. Do not wait until end of day.
+2. **GitHub Issue must include:**
+ - The exact command executed
+ - Full terminal output (complete error logs)
+ - Current status of all `kubectl` tunnels
+3. **Resolution** — If the error is not reproducible by the CTO/DevOps team, a 5-minute live debugging session will be scheduled to identify local network or configuration issues.
+
+See [`docs/RUNBOOK.md`](docs/RUNBOOK.md) for full incident playbooks and escalation paths.
+
+---
+
+*These standards exist to protect the integrity of the Brunix Assistance Engine and to ensure every member of the team can work confidently and efficiently. They are not bureaucratic overhead — they are the foundation of a reliable, scalable engineering practice.*
+
+*— Rafael Ruiz, CTO, AVAP Technology*
diff --git a/Docker/.dockerignore b/Docker/.dockerignore
new file mode 100644
index 0000000..b7acc73
--- /dev/null
+++ b/Docker/.dockerignore
@@ -0,0 +1,53 @@
+# Documentation
+*.md
+documentation/
+
+# Build and dependency files
+Makefile
+*.pyc
+__pycache__/
+*.egg-info/
+dist/
+build/
+
+# Development and testing
+.venv/
+venv/
+env/
+.pytest_cache/
+.coverage
+
+# Git and version control
+.git/
+.gitignore
+.gitattributes
+
+# IDE and editor files
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+.DS_Store
+
+# Environment files
+.env
+.env.local
+.env.*.local
+
+# Docker files (no copy Docker files into the image)
+Dockerfile
+docker-compose.yaml
+
+# CI/CD
+.github/
+.gitlab-ci.yml
+
+# Temporary files
+*.tmp
+*.log
+scratches/
+
+# Node modules (if any)
+node_modules/
+npm-debug.log
diff --git a/Dockerfile b/Docker/Dockerfile
similarity index 57%
rename from Dockerfile
rename to Docker/Dockerfile
index fd014bb..4166505 100644
--- a/Dockerfile
+++ b/Docker/Dockerfile
@@ -5,27 +5,19 @@ ENV PYTHONUNBUFFERED=1
WORKDIR /app
+COPY ./requirements.txt .
+
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
curl \
- libpq-dev \
- protobuf-compiler \
+ protobuf-compiler \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir --upgrade pip
-RUN pip install --no-cache-dir \
- langchain==0.1.0 \
- langfuse>=2.0.0 \
- langgraph \
- langchain-openai \
- langchain-elasticsearch \
- grpcio \
- grpcio-tools \
- psycopg2-binary \
- pydantic
+RUN pip install --no-cache-dir -r requirements.txt
COPY ./protos ./protos
-COPY . .
+COPY ./src ./src
RUN python -m grpc_tools.protoc \
--proto_path=./protos \
@@ -33,6 +25,10 @@ RUN python -m grpc_tools.protoc \
--grpc_python_out=./src \
./protos/brunix.proto
+COPY entrypoint.sh /entrypoint.sh
+RUN chmod +x /entrypoint.sh
+
EXPOSE 50051
-#CMD ["tail", "-f", "/dev/null"]
-CMD ["python", "src/server.py"]
+EXPOSE 8000
+
+ENTRYPOINT ["/entrypoint.sh"]
\ No newline at end of file
diff --git a/Docker/docker-compose.yaml b/Docker/docker-compose.yaml
new file mode 100644
index 0000000..7716065
--- /dev/null
+++ b/Docker/docker-compose.yaml
@@ -0,0 +1,24 @@
+version: '3.8'
+
+services:
+ brunix-engine:
+ build: .
+ container_name: brunix-assistance-engine
+ ports:
+ - "50052:50051"
+ - "8000:8000"
+ environment:
+ ELASTICSEARCH_URL: ${ELASTICSEARCH_URL}
+ ELASTICSEARCH_INDEX: ${ELASTICSEARCH_INDEX}
+ POSTGRES_URL: ${POSTGRES_URL}
+ LANGFUSE_HOST: ${LANGFUSE_HOST}
+ LANGFUSE_PUBLIC_KEY: ${LANGFUSE_PUBLIC_KEY}
+ LANGFUSE_SECRET_KEY: ${LANGFUSE_SECRET_KEY}
+ OLLAMA_URL: ${OLLAMA_URL}
+ OLLAMA_MODEL_NAME: ${OLLAMA_MODEL_NAME}
+ OLLAMA_EMB_MODEL_NAME: ${OLLAMA_EMB_MODEL_NAME}
+ PROXY_THREAD_WORKERS: 10
+
+ extra_hosts:
+ - "host.docker.internal:host-gateway"
+
diff --git a/Docker/entrypoint.sh b/Docker/entrypoint.sh
new file mode 100644
index 0000000..4e27203
--- /dev/null
+++ b/Docker/entrypoint.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+set -e
+
+echo "[entrypoint] Starting Brunix Engine (gRPC :50051)..."
+python src/server.py &
+ENGINE_PID=$!
+
+echo "[entrypoint] Starting OpenAI Proxy (HTTP :8000)..."
+uvicorn openai_proxy:app --host 0.0.0.0 --port 8000 --workers 4 --app-dir src &
+PROXY_PID=$!
+
+wait_any() {
+ while kill -0 $ENGINE_PID 2>/dev/null && kill -0 $PROXY_PID 2>/dev/null; do
+ sleep 2
+ done
+
+ if ! kill -0 $ENGINE_PID 2>/dev/null; then
+ echo "[entrypoint] Engine died — stopping proxy"
+ kill $PROXY_PID 2>/dev/null
+ exit 1
+ fi
+
+ if ! kill -0 $PROXY_PID 2>/dev/null; then
+ echo "[entrypoint] Proxy died — stopping engine"
+ kill $ENGINE_PID 2>/dev/null
+ exit 1
+ fi
+}
+
+wait_any
\ No newline at end of file
diff --git a/Docker/protos/brunix.proto b/Docker/protos/brunix.proto
new file mode 100644
index 0000000..adde716
--- /dev/null
+++ b/Docker/protos/brunix.proto
@@ -0,0 +1,62 @@
+syntax = "proto3";
+
+package brunix;
+
+service AssistanceEngine {
+ // Respuesta completa — compatible con clientes existentes
+ rpc AskAgent (AgentRequest) returns (stream AgentResponse);
+
+ // Streaming real token a token desde Ollama
+ rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
+
+ // Evaluación RAGAS con Claude como juez
+ rpc EvaluateRAG (EvalRequest) returns (EvalResponse);
+}
+
+// ---------------------------------------------------------------------------
+// AskAgent / AskAgentStream — mismos mensajes, dos comportamientos
+// ---------------------------------------------------------------------------
+
+message AgentRequest {
+ string query = 1;
+ string session_id = 2;
+}
+
+message AgentResponse {
+ string text = 1;
+ string avap_code = 2;
+ bool is_final = 3;
+}
+
+// ---------------------------------------------------------------------------
+// EvaluateRAG
+// ---------------------------------------------------------------------------
+
+message EvalRequest {
+ string category = 1;
+ int32 limit = 2;
+ string index = 3;
+}
+
+message EvalResponse {
+ string status = 1;
+ int32 questions_evaluated = 2;
+ float elapsed_seconds = 3;
+ string judge_model = 4;
+ string index = 5;
+ float faithfulness = 6;
+ float answer_relevancy = 7;
+ float context_recall = 8;
+ float context_precision = 9;
+ float global_score = 10;
+ string verdict = 11;
+ repeated QuestionDetail details = 12;
+}
+
+message QuestionDetail {
+ string id = 1;
+ string category = 2;
+ string question = 3;
+ string answer_preview = 4;
+ int32 n_chunks = 5;
+}
diff --git a/Docker/requirements.txt b/Docker/requirements.txt
new file mode 100644
index 0000000..5ff3ce9
--- /dev/null
+++ b/Docker/requirements.txt
@@ -0,0 +1,325 @@
+# This file was autogenerated by uv via the following command:
+# uv export --format requirements-txt --no-hashes --no-dev -o Docker/requirements.txt
+aiohappyeyeballs==2.6.1
+ # via aiohttp
+aiohttp==3.13.3
+ # via langchain-community
+aiosignal==1.4.0
+ # via aiohttp
+annotated-types==0.7.0
+ # via pydantic
+anyio==4.12.1
+ # via httpx
+attrs==25.4.0
+ # via aiohttp
+boto3==1.42.58
+ # via langchain-aws
+botocore==1.42.58
+ # via
+ # boto3
+ # s3transfer
+certifi==2026.1.4
+ # via
+ # elastic-transport
+ # httpcore
+ # httpx
+ # requests
+charset-normalizer==3.4.4
+ # via requests
+chonkie==1.5.6
+ # via assistance-engine
+chonkie-core==0.9.2
+ # via chonkie
+click==8.3.1
+ # via nltk
+colorama==0.4.6 ; sys_platform == 'win32'
+ # via
+ # click
+ # loguru
+ # tqdm
+dataclasses-json==0.6.7
+ # via langchain-community
+elastic-transport==8.17.1
+ # via elasticsearch
+elasticsearch==8.19.3
+ # via langchain-elasticsearch
+filelock==3.24.3
+ # via huggingface-hub
+frozenlist==1.8.0
+ # via
+ # aiohttp
+ # aiosignal
+fsspec==2025.10.0
+ # via huggingface-hub
+greenlet==3.3.2 ; platform_machine == 'AMD64' or platform_machine == 'WIN32' or platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'ppc64le' or platform_machine == 'win32' or platform_machine == 'x86_64'
+ # via sqlalchemy
+grpcio==1.78.1
+ # via
+ # assistance-engine
+ # grpcio-reflection
+ # grpcio-tools
+grpcio-reflection==1.78.1
+ # via assistance-engine
+grpcio-tools==1.78.1
+ # via assistance-engine
+h11==0.16.0
+ # via httpcore
+hf-xet==1.3.0 ; platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'
+ # via huggingface-hub
+httpcore==1.0.9
+ # via httpx
+httpx==0.28.1
+ # via
+ # langgraph-sdk
+ # langsmith
+ # ollama
+httpx-sse==0.4.3
+ # via langchain-community
+huggingface-hub==0.36.2
+ # via
+ # langchain-huggingface
+ # tokenizers
+idna==3.11
+ # via
+ # anyio
+ # httpx
+ # requests
+ # yarl
+jinja2==3.1.6
+ # via model2vec
+jmespath==1.1.0
+ # via
+ # boto3
+ # botocore
+joblib==1.5.3
+ # via
+ # model2vec
+ # nltk
+jsonpatch==1.33
+ # via langchain-core
+jsonpointer==3.0.0
+ # via jsonpatch
+langchain==1.2.10
+ # via assistance-engine
+langchain-aws==1.3.1
+ # via assistance-engine
+langchain-classic==1.0.1
+ # via langchain-community
+langchain-community==0.4.1
+ # via assistance-engine
+langchain-core==1.2.15
+ # via
+ # langchain
+ # langchain-aws
+ # langchain-classic
+ # langchain-community
+ # langchain-elasticsearch
+ # langchain-huggingface
+ # langchain-ollama
+ # langchain-text-splitters
+ # langgraph
+ # langgraph-checkpoint
+ # langgraph-prebuilt
+langchain-elasticsearch==1.0.0
+ # via assistance-engine
+langchain-huggingface==1.2.0
+ # via assistance-engine
+langchain-ollama==1.0.1
+ # via assistance-engine
+langchain-text-splitters==1.1.1
+ # via langchain-classic
+langgraph==1.0.9
+ # via langchain
+langgraph-checkpoint==4.0.0
+ # via
+ # langgraph
+ # langgraph-prebuilt
+langgraph-prebuilt==1.0.8
+ # via langgraph
+langgraph-sdk==0.3.8
+ # via langgraph
+langsmith==0.7.6
+ # via
+ # langchain-classic
+ # langchain-community
+ # langchain-core
+loguru==0.7.3
+ # via assistance-engine
+markdown-it-py==4.0.0
+ # via rich
+markupsafe==3.0.3
+ # via jinja2
+marshmallow==3.26.2
+ # via dataclasses-json
+mdurl==0.1.2
+ # via markdown-it-py
+model2vec==0.7.0
+ # via chonkie
+multidict==6.7.1
+ # via
+ # aiohttp
+ # yarl
+mypy-extensions==1.1.0
+ # via typing-inspect
+nltk==3.9.3
+ # via assistance-engine
+numpy==2.4.2
+ # via
+ # assistance-engine
+ # chonkie
+ # chonkie-core
+ # elasticsearch
+ # langchain-aws
+ # langchain-community
+ # model2vec
+ # pandas
+ollama==0.6.1
+ # via langchain-ollama
+orjson==3.11.7
+ # via
+ # langgraph-sdk
+ # langsmith
+ormsgpack==1.12.2
+ # via langgraph-checkpoint
+packaging==24.2
+ # via
+ # huggingface-hub
+ # langchain-core
+ # langsmith
+ # marshmallow
+pandas==3.0.1
+ # via assistance-engine
+propcache==0.4.1
+ # via
+ # aiohttp
+ # yarl
+protobuf==6.33.5
+ # via
+ # grpcio-reflection
+ # grpcio-tools
+pydantic==2.12.5
+ # via
+ # langchain
+ # langchain-aws
+ # langchain-classic
+ # langchain-core
+ # langgraph
+ # langsmith
+ # ollama
+ # pydantic-settings
+pydantic-core==2.41.5
+ # via pydantic
+pydantic-settings==2.13.1
+ # via langchain-community
+pygments==2.19.2
+ # via rich
+python-dateutil==2.9.0.post0
+ # via
+ # botocore
+ # elasticsearch
+ # pandas
+python-dotenv==1.2.1
+ # via
+ # assistance-engine
+ # pydantic-settings
+pyyaml==6.0.3
+ # via
+ # huggingface-hub
+ # langchain-classic
+ # langchain-community
+ # langchain-core
+rapidfuzz==3.14.3
+ # via assistance-engine
+regex==2026.2.19
+ # via nltk
+requests==2.32.5
+ # via
+ # huggingface-hub
+ # langchain-classic
+ # langchain-community
+ # langsmith
+ # requests-toolbelt
+requests-toolbelt==1.0.0
+ # via langsmith
+rich==14.3.3
+ # via model2vec
+s3transfer==0.16.0
+ # via boto3
+safetensors==0.7.0
+ # via model2vec
+setuptools==82.0.0
+ # via
+ # grpcio-tools
+ # model2vec
+simsimd==6.5.13
+ # via elasticsearch
+six==1.17.0
+ # via python-dateutil
+sqlalchemy==2.0.46
+ # via
+ # langchain-classic
+ # langchain-community
+tenacity==9.1.4
+ # via
+ # chonkie
+ # langchain-community
+ # langchain-core
+tokenizers==0.22.2
+ # via
+ # chonkie
+ # langchain-huggingface
+ # model2vec
+tqdm==4.67.3
+ # via
+ # assistance-engine
+ # chonkie
+ # huggingface-hub
+ # model2vec
+ # nltk
+typing-extensions==4.15.0
+ # via
+ # aiosignal
+ # anyio
+ # elasticsearch
+ # grpcio
+ # huggingface-hub
+ # langchain-core
+ # pydantic
+ # pydantic-core
+ # sqlalchemy
+ # typing-inspect
+ # typing-inspection
+typing-inspect==0.9.0
+ # via dataclasses-json
+typing-inspection==0.4.2
+ # via
+ # pydantic
+ # pydantic-settings
+tzdata==2025.3 ; sys_platform == 'emscripten' or sys_platform == 'win32'
+ # via pandas
+urllib3==2.6.3
+ # via
+ # botocore
+ # elastic-transport
+ # requests
+uuid-utils==0.14.1
+ # via
+ # langchain-core
+ # langsmith
+win32-setctime==1.2.0 ; sys_platform == 'win32'
+ # via loguru
+xxhash==3.6.0
+ # via
+ # langgraph
+ # langsmith
+yarl==1.22.0
+ # via aiohttp
+zstandard==0.25.0
+ # via langsmith
+
+ragas
+datasets
+langchain-anthropic
+
+fastapi>=0.111.0
+uvicorn[standard]>=0.29.0
\ No newline at end of file
diff --git a/Docker/src/evaluate.py b/Docker/src/evaluate.py
new file mode 100644
index 0000000..791f9fb
--- /dev/null
+++ b/Docker/src/evaluate.py
@@ -0,0 +1,230 @@
+import os
+import time
+import json
+import logging
+from collections import defaultdict
+from pathlib import Path
+from typing import Optional
+from ragas import evaluate as ragas_evaluate
+from ragas.metrics import ( faithfulness, answer_relevancy, context_recall, context_precision,)
+from ragas.llms import LangchainLLMWrapper
+from ragas.embeddings import LangchainEmbeddingsWrapper
+from datasets import Dataset
+from langchain_anthropic import ChatAnthropic
+
+logger = logging.getLogger(__name__)
+
+GOLDEN_DATASET_PATH = Path(__file__).parent / "golden_dataset.json"
+CLAUDE_MODEL = os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-20250514")
+ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY", "")
+K_RETRIEVE = 5
+
+
+
+ANTHROPIC_AVAILABLE = True
+
+
+from elasticsearch import Elasticsearch
+from langchain_core.messages import SystemMessage, HumanMessage
+
+def retrieve_context( es_client, embeddings, question, index, k = K_RETRIEVE,):
+
+ query_vector = None
+ try:
+ query_vector = embeddings.embed_query(question)
+ except Exception as e:
+ logger.warning(f"[eval] embed_query fails: {e}")
+
+ bm25_hits = []
+ try:
+ resp = es_client.search(
+ index=index,
+ body={
+ "size": k,
+ "query": {
+ "multi_match": {
+ "query": question,
+ "fields": ["content^2", "text^2"],
+ "type": "best_fields",
+ "fuzziness": "AUTO",
+ }
+ },
+ "_source": {"excludes": ["embedding"]},
+ }
+ )
+ bm25_hits = resp["hits"]["hits"]
+ except Exception as e:
+ logger.warning(f"[eval] BM25 fails: {e}")
+
+ knn_hits = []
+ if query_vector:
+ try:
+ resp = es_client.search(
+ index=index,
+ body={
+ "size": k,
+ "knn": {
+ "field": "embedding",
+ "query_vector": query_vector,
+ "k": k,
+ "num_candidates": k * 5,
+ },
+ "_source": {"excludes": ["embedding"]},
+ }
+ )
+ knn_hits = resp["hits"]["hits"]
+ except Exception as e:
+ logger.warning(f"[eval] kNN falló: {e}")
+
+ rrf_scores: dict[str, float] = defaultdict(float)
+ hit_by_id: dict[str, dict] = {}
+
+ for rank, hit in enumerate(bm25_hits):
+ doc_id = hit["_id"]
+ rrf_scores[doc_id] += 1.0 / (rank + 60)
+ hit_by_id[doc_id] = hit
+
+ for rank, hit in enumerate(knn_hits):
+ doc_id = hit["_id"]
+ rrf_scores[doc_id] += 1.0 / (rank + 60)
+ if doc_id not in hit_by_id:
+ hit_by_id[doc_id] = hit
+
+ ranked = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)[:k]
+
+ return [
+ hit_by_id[doc_id]["_source"].get("content")
+ or hit_by_id[doc_id]["_source"].get("text", "")
+ for doc_id, _ in ranked
+ if (
+ hit_by_id[doc_id]["_source"].get("content")
+ or hit_by_id[doc_id]["_source"].get("text", "")
+ ).strip()
+ ]
+
+
+def generate_answer(llm, question: str, contexts: list[str]) -> str:
+ try:
+ from prompts import GENERATE_PROMPT
+ context_text = "\n\n".join(
+ f"[{i+1}] {ctx}" for i, ctx in enumerate(contexts)
+ )
+ prompt = SystemMessage(
+ content=GENERATE_PROMPT.content.format(context=context_text)
+ )
+ resp = llm.invoke([prompt, HumanMessage(content=question)])
+ return resp.content.strip()
+ except Exception as e:
+ logger.warning(f"[eval] generate_answer fails: {e}")
+ return ""
+
+def run_evaluation( es_client, llm, embeddings, index_name, category = None, limit = None,):
+
+ if not ANTHROPIC_AVAILABLE:
+ return {"error": "langchain-anthropic no instalado. pip install langchain-anthropic"}
+ if not ANTHROPIC_API_KEY:
+ return {"error": "ANTHROPIC_API_KEY no configurada en .env"}
+ if not GOLDEN_DATASET_PATH.exists():
+ return {"error": f"Golden dataset no encontrado en {GOLDEN_DATASET_PATH}"}
+
+
+ questions = json.loads(GOLDEN_DATASET_PATH.read_text(encoding="utf-8"))
+ if category:
+ questions = [q for q in questions if q.get("category") == category]
+ if limit:
+ questions = questions[:limit]
+ if not questions:
+ return {"error": "NO QUESTIONS WITH THIS FILTERS"}
+
+ logger.info(f"[eval] makind: {len(questions)} questions, index={index_name}")
+
+ claude_judge = ChatAnthropic(
+ model=CLAUDE_MODEL,
+ api_key=ANTHROPIC_API_KEY,
+ temperature=0,
+ max_tokens=2048,
+ )
+
+ rows = {"question": [], "answer": [], "contexts": [], "ground_truth": []}
+ details = []
+ t_start = time.time()
+
+ for item in questions:
+ q_id = item["id"]
+ question = item["question"]
+ gt = item["ground_truth"]
+
+ logger.info(f"[eval] {q_id}: {question[:60]}")
+
+ contexts = retrieve_context(es_client, embeddings, question, index_name)
+ if not contexts:
+ logger.warning(f"[eval] No context for {q_id} — skipping")
+ continue
+
+ answer = generate_answer(llm, question, contexts)
+ if not answer:
+ logger.warning(f"[eval] No answers for {q_id} — skipping")
+ continue
+
+ rows["question"].append(question)
+ rows["answer"].append(answer)
+ rows["contexts"].append(contexts)
+ rows["ground_truth"].append(gt)
+
+ details.append({
+ "id": q_id,
+ "category": item.get("category", ""),
+ "question": question,
+ "answer_preview": answer[:300],
+ "n_chunks": len(contexts),
+ })
+
+ if not rows["question"]:
+ return {"error": "NO SAMPLES GENETARED"}
+
+ dataset = Dataset.from_dict(rows)
+ ragas_llm = LangchainLLMWrapper(claude_judge)
+ ragas_emb = LangchainEmbeddingsWrapper(embeddings)
+
+ metrics = [faithfulness, answer_relevancy, context_recall, context_precision]
+ for metric in metrics:
+ metric.llm = ragas_llm
+ if hasattr(metric, "embeddings"):
+ metric.embeddings = ragas_emb
+
+ logger.info("[eval] JUDGING BY CLAUDE...")
+ result = ragas_evaluate(dataset, metrics=metrics)
+
+ elapsed = time.time() - t_start
+
+ scores = {
+ "faithfulness": round(float(result.get("faithfulness", 0)), 4),
+ "answer_relevancy": round(float(result.get("answer_relevancy", 0)), 4),
+ "context_recall": round(float(result.get("context_recall", 0)), 4),
+ "context_precision": round(float(result.get("context_precision", 0)), 4),
+ }
+
+ valid_scores = [v for v in scores.values() if v > 0]
+ global_score = round(sum(valid_scores) / len(valid_scores), 4) if valid_scores else 0.0
+
+ verdict = (
+ "EXCELLENT" if global_score >= 0.8 else
+ "ACCEPTABLE" if global_score >= 0.6 else
+ "INSUFFICIENT"
+ )
+
+ logger.info(f"[eval] FINISHED — global={global_score} verdict={verdict} "
+ f"elapsed={elapsed:.0f}s")
+
+ return {
+ "status": "ok",
+ "questions_evaluated": len(rows["question"]),
+ "elapsed_seconds": round(elapsed, 1),
+ "judge_model": CLAUDE_MODEL,
+ "index": index_name,
+ "category_filter": category or "all",
+ "scores": scores,
+ "global_score": global_score,
+ "verdict": verdict,
+ "details": details,
+ }
\ No newline at end of file
diff --git a/Docker/src/graph.py b/Docker/src/graph.py
new file mode 100644
index 0000000..b092e74
--- /dev/null
+++ b/Docker/src/graph.py
@@ -0,0 +1,391 @@
+import logging
+from collections import defaultdict
+from elasticsearch import Elasticsearch
+from langchain_core.documents import Document
+from langchain_core.messages import AIMessage, SystemMessage, HumanMessage, BaseMessage
+from langgraph.graph import END, StateGraph
+from langgraph.graph.state import CompiledStateGraph
+
+from prompts import (
+ CLASSIFY_PROMPT_TEMPLATE,
+ CODE_GENERATION_PROMPT,
+ CONVERSATIONAL_PROMPT,
+ GENERATE_PROMPT,
+ REFORMULATE_PROMPT,
+)
+
+from state import AgentState
+
+logger = logging.getLogger(__name__)
+
+session_store: dict[str, list] = defaultdict(list)
+
+def format_context(docs):
+ chunks = []
+ for i, doc in enumerate(docs, 1):
+ meta = doc.metadata or {}
+ chunk_id = meta.get("chunk_id", meta.get("id", f"chunk-{i}"))
+ source = meta.get("source_file", meta.get("source", "unknown"))
+ doc_type = meta.get("doc_type", "")
+ block_type = meta.get("block_type", "")
+ section = meta.get("section", "")
+
+ text = (doc.page_content or "").strip()
+ if not text:
+ text = meta.get("content") or meta.get("text") or ""
+
+ header_parts = [f"[{i}]", f"id={chunk_id}"]
+ if doc_type: header_parts.append(f"type={doc_type}")
+ if block_type: header_parts.append(f"block={block_type}")
+ if section: header_parts.append(f"section={section}")
+ header_parts.append(f"source={source}")
+
+ if doc_type in ("code", "code_example", "bnf") or \
+ block_type in ("function", "if", "startLoop", "try"):
+ header_parts.append("[AVAP CODE]")
+
+ chunks.append(" ".join(header_parts) + "\n" + text)
+
+ return "\n\n".join(chunks)
+
+
+def format_history_for_classify(messages):
+ lines = []
+ for msg in messages[-6:]:
+ if isinstance(msg, HumanMessage):
+ lines.append(f"User: {msg.content}")
+ elif isinstance(msg, AIMessage):
+ lines.append(f"Assistant: {msg.content[:300]}")
+ elif isinstance(msg, dict):
+ role = msg.get("role", "user")
+ content = msg.get("content", "")[:300]
+ lines.append(f"{role.capitalize()}: {content}")
+ return "\n".join(lines) if lines else "(no history)"
+
+
+def hybrid_search_native(es_client, embeddings, query, index_name, k=8):
+ query_vector = None
+ try:
+ query_vector = embeddings.embed_query(query)
+ except Exception as e:
+ logger.warning(f"[hybrid] embed_query fails: {e}")
+
+ bm25_hits = []
+ try:
+ resp = es_client.search(
+ index=index_name,
+ body={
+ "size": k,
+ "query": {
+ "multi_match": {
+ "query": query,
+ "fields": ["content^2", "text^2"],
+ "type": "best_fields",
+ "fuzziness": "AUTO",
+ }
+ },
+ "_source": {"excludes": ["embedding"]},
+ }
+ )
+ bm25_hits = resp["hits"]["hits"]
+ logger.info(f"[hybrid] BM25 -> {len(bm25_hits)} hits")
+ except Exception as e:
+ logger.warning(f"[hybrid] BM25 fails: {e}")
+
+ knn_hits = []
+ if query_vector:
+ try:
+ resp = es_client.search(
+ index=index_name,
+ body={
+ "size": k,
+ "knn": {
+ "field": "embedding",
+ "query_vector": query_vector,
+ "k": k,
+ "num_candidates": k * 5,
+ },
+ "_source": {"excludes": ["embedding"]},
+ }
+ )
+ knn_hits = resp["hits"]["hits"]
+ logger.info(f"[hybrid] kNN -> {len(knn_hits)} hits")
+ except Exception as e:
+ logger.warning(f"[hybrid] kNN fails: {e}")
+
+ rrf_scores: dict[str, float] = defaultdict(float)
+ hit_by_id: dict[str, dict] = {}
+
+ for rank, hit in enumerate(bm25_hits):
+ doc_id = hit["_id"]
+ rrf_scores[doc_id] += 1.0 / (rank + 60)
+ hit_by_id[doc_id] = hit
+
+ for rank, hit in enumerate(knn_hits):
+ doc_id = hit["_id"]
+ rrf_scores[doc_id] += 1.0 / (rank + 60)
+ if doc_id not in hit_by_id:
+ hit_by_id[doc_id] = hit
+
+ ranked = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)[:k]
+
+ docs = []
+ for doc_id, score in ranked:
+ src = hit_by_id[doc_id]["_source"]
+ text = src.get("content") or src.get("text") or ""
+ meta = {k: v for k, v in src.items()
+ if k not in ("content", "text", "embedding")}
+ meta["id"]= doc_id
+ meta["rrf_score"] = score
+ docs.append(Document(page_content=text, metadata=meta))
+
+ logger.info(f"[hybrid] RRF -> {len(docs)} final docs")
+ return docs
+
+def build_graph(llm, embeddings, es_client, index_name):
+
+ def _persist(state: AgentState, response: BaseMessage):
+ session_id = state.get("session_id", "")
+ if session_id:
+ session_store[session_id] = list(state["messages"]) + [response]
+
+ def classify(state):
+ messages = state["messages"]
+ user_msg = messages[-1]
+ question = getattr(user_msg, "content",
+ user_msg.get("content", "")
+ if isinstance(user_msg, dict) else "")
+ history_msgs = messages[:-1]
+
+ if not history_msgs:
+ prompt_content = (
+ CLASSIFY_PROMPT_TEMPLATE
+ .replace("{history}", "(no history)")
+ .replace("{message}", question)
+ )
+ resp = llm.invoke([SystemMessage(content=prompt_content)])
+ raw = resp.content.strip().upper()
+ query_type = _parse_query_type(raw)
+ logger.info(f"[classify] no historic content raw='{raw}' -> {query_type}")
+ return {"query_type": query_type}
+
+ history_text = format_history_for_classify(history_msgs)
+ prompt_content = (
+ CLASSIFY_PROMPT_TEMPLATE
+ .replace("{history}", history_text)
+ .replace("{message}", question)
+ )
+ resp = llm.invoke([SystemMessage(content=prompt_content)])
+ raw = resp.content.strip().upper()
+ query_type = _parse_query_type(raw)
+ logger.info(f"[classify] raw='{raw}' -> {query_type}")
+ return {"query_type": query_type}
+
+ def _parse_query_type(raw: str) -> str:
+ if raw.startswith("CODE_GENERATION") or "CODE" in raw:
+ return "CODE_GENERATION"
+ if raw.startswith("CONVERSATIONAL"):
+ return "CONVERSATIONAL"
+ return "RETRIEVAL"
+
+ def reformulate(state: AgentState) -> AgentState:
+ user_msg = state["messages"][-1]
+ resp = llm.invoke([REFORMULATE_PROMPT, user_msg])
+ reformulated = resp.content.strip()
+ logger.info(f"[reformulate] -> '{reformulated}'")
+ return {"reformulated_query": reformulated}
+
+ def retrieve(state: AgentState) -> AgentState:
+ query = state["reformulated_query"]
+ docs = hybrid_search_native(
+ es_client=es_client,
+ embeddings=embeddings,
+ query=query,
+ index_name=index_name,
+ k=8,
+ )
+ context = format_context(docs)
+ logger.info(f"[retrieve] {len(docs)} docs, context len={len(context)}")
+ return {"context": context}
+
+ def generate(state):
+ prompt = SystemMessage(
+ content=GENERATE_PROMPT.content.format(context=state["context"])
+ )
+ resp = llm.invoke([prompt] + state["messages"])
+ logger.info(f"[generate] {len(resp.content)} chars")
+ _persist(state, resp)
+ return {"messages": [resp]}
+
+ def generate_code(state):
+ prompt = SystemMessage(
+ content=CODE_GENERATION_PROMPT.content.format(context=state["context"])
+ )
+ resp = llm.invoke([prompt] + state["messages"])
+ logger.info(f"[generate_code] {len(resp.content)} chars")
+ _persist(state, resp)
+ return {"messages": [resp]}
+
+ def respond_conversational(state):
+ resp = llm.invoke([CONVERSATIONAL_PROMPT] + state["messages"])
+ logger.info("[conversational] from comversation")
+ _persist(state, resp)
+ return {"messages": [resp]}
+
+ def route_by_type(state):
+ return state.get("query_type", "RETRIEVAL")
+
+ def route_after_retrieve(state):
+ qt = state.get("query_type", "RETRIEVAL")
+ return "generate_code" if qt == "CODE_GENERATION" else "generate"
+
+ graph_builder = StateGraph(AgentState)
+
+ graph_builder.add_node("classify", classify)
+ graph_builder.add_node("reformulate", reformulate)
+ graph_builder.add_node("retrieve", retrieve)
+ graph_builder.add_node("generate", generate)
+ graph_builder.add_node("generate_code", generate_code)
+ graph_builder.add_node("respond_conversational", respond_conversational)
+
+ graph_builder.set_entry_point("classify")
+
+ graph_builder.add_conditional_edges(
+ "classify",
+ route_by_type,
+ {
+ "RETRIEVAL": "reformulate",
+ "CODE_GENERATION": "reformulate",
+ "CONVERSATIONAL": "respond_conversational",
+ }
+ )
+
+ graph_builder.add_edge("reformulate", "retrieve")
+
+ graph_builder.add_conditional_edges(
+ "retrieve",
+ route_after_retrieve,
+ {
+ "generate": "generate",
+ "generate_code": "generate_code",
+ }
+ )
+
+ graph_builder.add_edge("generate", END)
+ graph_builder.add_edge("generate_code", END)
+ graph_builder.add_edge("respond_conversational", END)
+
+ return graph_builder.compile()
+
+
+def build_prepare_graph(llm, embeddings, es_client, index_name):
+
+ def classify(state):
+ messages = state["messages"]
+ user_msg = messages[-1]
+ question = getattr(user_msg, "content",
+ user_msg.get("content", "")
+ if isinstance(user_msg, dict) else "")
+ history_msgs = messages[:-1]
+
+ if not history_msgs:
+ prompt_content = (
+ CLASSIFY_PROMPT_TEMPLATE
+ .replace("{history}", "(no history)")
+ .replace("{message}", question)
+ )
+ resp = llm.invoke([SystemMessage(content=prompt_content)])
+ raw = resp.content.strip().upper()
+ query_type = _parse_query_type(raw)
+ logger.info(f"[prepare/classify] no history raw='{raw}' -> {query_type}")
+ return {"query_type": query_type}
+
+ history_text = format_history_for_classify(history_msgs)
+ prompt_content = (
+ CLASSIFY_PROMPT_TEMPLATE
+ .replace("{history}", history_text)
+ .replace("{message}", question)
+ )
+ resp = llm.invoke([SystemMessage(content=prompt_content)])
+ raw = resp.content.strip().upper()
+ query_type = _parse_query_type(raw)
+ logger.info(f"[prepare/classify] raw='{raw}' -> {query_type}")
+ return {"query_type": query_type}
+
+ def _parse_query_type(raw: str) -> str:
+ if raw.startswith("CODE_GENERATION") or "CODE" in raw:
+ return "CODE_GENERATION"
+ if raw.startswith("CONVERSATIONAL"):
+ return "CONVERSATIONAL"
+ return "RETRIEVAL"
+
+ def reformulate(state: AgentState) -> AgentState:
+ user_msg = state["messages"][-1]
+ resp = llm.invoke([REFORMULATE_PROMPT, user_msg])
+ reformulated = resp.content.strip()
+ logger.info(f"[prepare/reformulate] -> '{reformulated}'")
+ return {"reformulated_query": reformulated}
+
+ def retrieve(state: AgentState) -> AgentState:
+ query = state["reformulated_query"]
+ docs = hybrid_search_native(
+ es_client=es_client,
+ embeddings=embeddings,
+ query=query,
+ index_name=index_name,
+ k=8,
+ )
+ context = format_context(docs)
+ logger.info(f"[prepare/retrieve] {len(docs)} docs, context len={len(context)}")
+ return {"context": context}
+
+ def skip_retrieve(state: AgentState) -> AgentState:
+ return {"context": ""}
+
+ def route_by_type(state):
+ return state.get("query_type", "RETRIEVAL")
+
+ graph_builder = StateGraph(AgentState)
+
+ graph_builder.add_node("classify", classify)
+ graph_builder.add_node("reformulate", reformulate)
+ graph_builder.add_node("retrieve", retrieve)
+ graph_builder.add_node("skip_retrieve", skip_retrieve)
+
+ graph_builder.set_entry_point("classify")
+
+ graph_builder.add_conditional_edges(
+ "classify",
+ route_by_type,
+ {
+ "RETRIEVAL": "reformulate",
+ "CODE_GENERATION": "reformulate",
+ "CONVERSATIONAL": "skip_retrieve",
+ }
+ )
+
+ graph_builder.add_edge("reformulate", "retrieve")
+ graph_builder.add_edge("retrieve", END)
+ graph_builder.add_edge("skip_retrieve",END)
+
+ return graph_builder.compile()
+
+
+def build_final_messages(state: AgentState) -> list:
+ query_type = state.get("query_type", "RETRIEVAL")
+ context = state.get("context", "")
+ messages = state.get("messages", [])
+
+ if query_type == "CONVERSATIONAL":
+ return [CONVERSATIONAL_PROMPT] + messages
+
+ if query_type == "CODE_GENERATION":
+ prompt = SystemMessage(
+ content=CODE_GENERATION_PROMPT.content.format(context=context)
+ )
+ else:
+ prompt = SystemMessage(
+ content=GENERATE_PROMPT.content.format(context=context)
+ )
+
+ return [prompt] + messages
\ No newline at end of file
diff --git a/Docker/src/openai_proxy.py b/Docker/src/openai_proxy.py
new file mode 100644
index 0000000..157f609
--- /dev/null
+++ b/Docker/src/openai_proxy.py
@@ -0,0 +1,420 @@
+import json
+import os
+import time
+import uuid
+import logging
+import asyncio
+import concurrent.futures
+from typing import AsyncIterator, Optional, Any, Literal, Union
+
+import grpc
+import brunix_pb2
+import brunix_pb2_grpc
+
+from fastapi import FastAPI, HTTPException
+from fastapi.responses import JSONResponse, StreamingResponse
+from pydantic import BaseModel
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger("openai-proxy")
+
+_thread_pool = concurrent.futures.ThreadPoolExecutor(
+ max_workers=int(os.getenv("PROXY_THREAD_WORKERS", "20"))
+)
+
+GRPC_TARGET = os.getenv("BRUNIX_GRPC_TARGET", "localhost:50051")
+PROXY_MODEL = os.getenv("PROXY_MODEL_ID", "brunix")
+
+_channel: Optional[grpc.Channel] = None
+_stub: Optional[brunix_pb2_grpc.AssistanceEngineStub] = None
+
+
+def get_stub() -> brunix_pb2_grpc.AssistanceEngineStub:
+ global _channel, _stub
+ if _stub is None:
+ _channel = grpc.insecure_channel(GRPC_TARGET)
+ _stub = brunix_pb2_grpc.AssistanceEngineStub(_channel)
+ logger.info(f"[gRPC] connected to {GRPC_TARGET}")
+ return _stub
+
+
+app = FastAPI(
+ title="Brunix OpenAI-Compatible Proxy",
+ version="2.0.0",
+ description="stream:false → AskAgent | stream:true → AskAgentStream",
+)
+
+class ChatMessage(BaseModel):
+ role: Literal["system", "user", "assistant", "function"] = "user"
+ content: str = ""
+ name: Optional[str] = None
+
+
+class ChatCompletionRequest(BaseModel):
+ model: str = PROXY_MODEL
+ messages: list[ChatMessage]
+ stream: bool = False
+ temperature: Optional[float] = None
+ max_tokens: Optional[int] = None
+ session_id: Optional[str] = None # extensión Brunix
+ top_p: Optional[float] = None
+ n: Optional[int] = 1
+ stop: Optional[Any] = None
+ presence_penalty: Optional[float] = None
+ frequency_penalty: Optional[float] = None
+ user: Optional[str] = None
+
+
+class CompletionRequest(BaseModel):
+ model: str = PROXY_MODEL
+ prompt: Union[str, list[str]] = ""
+ stream: bool = False
+ temperature: Optional[float] = None
+ max_tokens: Optional[int] = None
+ session_id: Optional[str] = None
+ suffix: Optional[str] = None
+ top_p: Optional[float] = None
+ n: Optional[int] = 1
+ stop: Optional[Any] = None
+ user: Optional[str] = None
+
+
+# Ollama schemas
+class OllamaChatMessage(BaseModel):
+ role: str = "user"
+ content: str = ""
+
+
+class OllamaChatRequest(BaseModel):
+ model: str = PROXY_MODEL
+ messages: list[OllamaChatMessage]
+ stream: bool = True # Ollama hace stream por defecto
+ session_id: Optional[str] = None
+
+
+class OllamaGenerateRequest(BaseModel):
+ model: str = PROXY_MODEL
+ prompt: str = ""
+ stream: bool = True
+ session_id: Optional[str] = None
+
+
+def _ts() -> int:
+ return int(time.time())
+
+
+def _chat_response(content: str, req_id: str) -> dict:
+ return {
+ "id": req_id, "object": "chat.completion", "created": _ts(),
+ "model": PROXY_MODEL,
+ "choices": [{"index": 0, "message": {"role": "assistant", "content": content}, "finish_reason": "stop"}],
+ "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
+ }
+
+
+def _completion_response(text: str, req_id: str) -> dict:
+ return {
+ "id": req_id, "object": "text_completion", "created": _ts(),
+ "model": PROXY_MODEL,
+ "choices": [{"text": text, "index": 0, "logprobs": None, "finish_reason": "stop"}],
+ "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
+ }
+
+
+def _chat_chunk(delta: str, req_id: str, finish: Optional[str] = None) -> dict:
+ return {
+ "id": req_id, "object": "chat.completion.chunk", "created": _ts(),
+ "model": PROXY_MODEL,
+ "choices": [{"index": 0,
+ "delta": {"role": "assistant", "content": delta} if delta else {},
+ "finish_reason": finish}],
+ }
+
+
+def _completion_chunk(text: str, req_id: str, finish: Optional[str] = None) -> dict:
+ return {
+ "id": req_id, "object": "text_completion", "created": _ts(),
+ "model": PROXY_MODEL,
+ "choices": [{"text": text, "index": 0, "logprobs": None, "finish_reason": finish}],
+ }
+
+
+def _sse(data: dict) -> str:
+ return f"data: {json.dumps(data)}\n\n"
+
+
+def _sse_done() -> str:
+ return "data: [DONE]\n\n"
+
+
+def _query_from_messages(messages: list[ChatMessage]) -> str:
+ for m in reversed(messages):
+ if m.role == "user":
+ return m.content
+ return ""
+
+
+async def _invoke_blocking(query: str, session_id: str) -> str:
+
+ loop = asyncio.get_event_loop()
+
+ def _call():
+ stub = get_stub()
+ req = brunix_pb2.AgentRequest(query=query, session_id=session_id)
+ parts = []
+ for resp in stub.AskAgent(req):
+ if resp.text:
+ parts.append(resp.text)
+ return "".join(parts)
+
+ return await loop.run_in_executor(_thread_pool, _call)
+
+
+async def _iter_stream(query: str, session_id: str) -> AsyncIterator[brunix_pb2.AgentResponse]:
+
+ loop = asyncio.get_event_loop()
+ queue: asyncio.Queue = asyncio.Queue()
+
+ def _producer():
+ try:
+ stub = get_stub()
+ req = brunix_pb2.AgentRequest(query=query, session_id=session_id)
+ for resp in stub.AskAgentStream(req): # ← AskAgentStream
+ asyncio.run_coroutine_threadsafe(queue.put(resp), loop).result()
+ except Exception as e:
+ asyncio.run_coroutine_threadsafe(queue.put(e), loop).result()
+ finally:
+ asyncio.run_coroutine_threadsafe(queue.put(None), loop).result() # sentinel
+
+ _thread_pool.submit(_producer)
+
+ while True:
+ item = await queue.get()
+ if item is None:
+ break
+ if isinstance(item, Exception):
+ raise item
+ yield item
+
+
+async def _stream_chat(query: str, session_id: str, req_id: str) -> AsyncIterator[str]:
+ try:
+ async for resp in _iter_stream(query, session_id):
+ if resp.is_final:
+ yield _sse(_chat_chunk("", req_id, finish="stop"))
+ break
+ if resp.text:
+ yield _sse(_chat_chunk(resp.text, req_id))
+ except Exception as e:
+ logger.error(f"[stream_chat] error: {e}")
+ yield _sse(_chat_chunk(f"[Error: {e}]", req_id, finish="stop"))
+
+ yield _sse_done()
+
+
+async def _stream_completion(query: str, session_id: str, req_id: str) -> AsyncIterator[str]:
+ try:
+ async for resp in _iter_stream(query, session_id):
+ if resp.is_final:
+ yield _sse(_completion_chunk("", req_id, finish="stop"))
+ break
+ if resp.text:
+ yield _sse(_completion_chunk(resp.text, req_id))
+ except Exception as e:
+ logger.error(f"[stream_completion] error: {e}")
+ yield _sse(_completion_chunk(f"[Error: {e}]", req_id, finish="stop"))
+
+ yield _sse_done()
+
+
+def _ollama_chat_chunk(token: str, done: bool) -> str:
+ return json.dumps({
+ "model": PROXY_MODEL,
+ "created_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
+ "message": {"role": "assistant", "content": token},
+ "done": done,
+ }) + "\n"
+
+
+def _ollama_generate_chunk(token: str, done: bool) -> str:
+ return json.dumps({
+ "model": PROXY_MODEL,
+ "created_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
+ "response": token,
+ "done": done,
+ }) + "\n"
+
+
+async def _stream_ollama_chat(query: str, session_id: str) -> AsyncIterator[str]:
+ try:
+ async for resp in _iter_stream(query, session_id):
+ if resp.is_final:
+ yield _ollama_chat_chunk("", done=True)
+ break
+ if resp.text:
+ yield _ollama_chat_chunk(resp.text, done=False)
+ except Exception as e:
+ logger.error(f"[ollama_chat] error: {e}")
+ yield _ollama_chat_chunk(f"[Error: {e}]", done=True)
+
+
+async def _stream_ollama_generate(query: str, session_id: str) -> AsyncIterator[str]:
+ try:
+ async for resp in _iter_stream(query, session_id):
+ if resp.is_final:
+ yield _ollama_generate_chunk("", done=True)
+ break
+ if resp.text:
+ yield _ollama_generate_chunk(resp.text, done=False)
+ except Exception as e:
+ logger.error(f"[ollama_generate] error: {e}")
+ yield _ollama_generate_chunk(f"[Error: {e}]", done=True)
+
+
+@app.get("/v1/models")
+async def list_models():
+ return {
+ "object": "list",
+ "data": [{
+ "id": PROXY_MODEL, "object": "model", "created": 1700000000,
+ "owned_by": "brunix", "permission": [], "root": PROXY_MODEL, "parent": None,
+ }],
+ }
+
+
+@app.post("/v1/chat/completions")
+async def chat_completions(req: ChatCompletionRequest):
+ query = _query_from_messages(req.messages)
+ session_id = req.session_id or req.user or "default"
+ req_id = f"chatcmpl-{uuid.uuid4().hex}"
+
+ logger.info(f"[chat] session={session_id} stream={req.stream} query='{query[:80]}'")
+
+ if not query:
+ raise HTTPException(status_code=400, detail="No user message found in messages.")
+
+ if req.stream:
+
+ return StreamingResponse(
+ _stream_chat(query, session_id, req_id),
+ media_type="text/event-stream",
+ headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+ )
+
+ try:
+ text = await _invoke_blocking(query, session_id)
+ except grpc.RpcError as e:
+ raise HTTPException(status_code=502, detail=f"gRPC error: {e.details()}")
+
+ return JSONResponse(_chat_response(text, req_id))
+
+
+@app.post("/v1/completions")
+async def completions(req: CompletionRequest):
+ query = req.prompt if isinstance(req.prompt, str) else " ".join(req.prompt)
+ session_id = req.session_id or req.user or "default"
+ req_id = f"cmpl-{uuid.uuid4().hex}"
+
+ logger.info(f"[completion] session={session_id} stream={req.stream} prompt='{query[:80]}'")
+
+ if not query:
+ raise HTTPException(status_code=400, detail="prompt is required.")
+
+ if req.stream:
+ return StreamingResponse(
+ _stream_completion(query, session_id, req_id),
+ media_type="text/event-stream",
+ headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+ )
+
+ try:
+ text = await _invoke_blocking(query, session_id)
+ except grpc.RpcError as e:
+ raise HTTPException(status_code=502, detail=f"gRPC error: {e.details()}")
+
+ return JSONResponse(_completion_response(text, req_id))
+
+
+@app.get("/health")
+async def health():
+ return {"status": "ok", "grpc_target": GRPC_TARGET}
+
+
+@app.get("/api/tags")
+async def ollama_tags():
+ return {
+ "models": [{
+ "name": PROXY_MODEL,
+ "model":PROXY_MODEL,
+ "modified_at": "2024-01-01T00:00:00Z",
+ "size": 0,
+ "digest":"brunix",
+ "details": {
+ "format": "gguf",
+ "family": "brunix",
+ "parameter_size": "unknown",
+ "quantization_level": "unknown",
+ },
+ }]
+ }
+
+
+@app.post("/api/chat")
+async def ollama_chat(req: OllamaChatRequest):
+
+ query = next((m.content for m in reversed(req.messages) if m.role == "user"), "")
+ session_id = req.session_id or "default"
+
+ logger.info(f"[ollama/chat] session={session_id} stream={req.stream} query='{query[:80]}'")
+
+ if not query:
+ raise HTTPException(status_code=400, detail="No user message found.")
+
+ if req.stream:
+ return StreamingResponse(
+ _stream_ollama_chat(query, session_id),
+ media_type="application/x-ndjson",
+ headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+ )
+
+ try:
+ text = await _invoke_blocking(query, session_id)
+ except grpc.RpcError as e:
+ raise HTTPException(status_code=502, detail=f"gRPC error: {e.details()}")
+
+ return JSONResponse({
+ "model": PROXY_MODEL,
+ "created_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
+ "message": {"role": "assistant", "content": text},
+ "done": True,
+ })
+
+
+@app.post("/api/generate")
+async def ollama_generate(req: OllamaGenerateRequest):
+
+ session_id = req.session_id or "default"
+
+ logger.info(f"[ollama/generate] session={session_id} stream={req.stream} prompt='{req.prompt[:80]}'")
+
+ if not req.prompt:
+ raise HTTPException(status_code=400, detail="prompt is required.")
+
+ if req.stream:
+ return StreamingResponse(
+ _stream_ollama_generate(req.prompt, session_id),
+ media_type="application/x-ndjson",
+ headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+ )
+
+ try:
+ text = await _invoke_blocking(req.prompt, session_id)
+ except grpc.RpcError as e:
+ raise HTTPException(status_code=502, detail=f"gRPC error: {e.details()}")
+
+ return JSONResponse({
+ "model": PROXY_MODEL,
+ "created_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
+ "response": text,
+ "done": True,
+ })
\ No newline at end of file
diff --git a/Docker/src/prompts.py b/Docker/src/prompts.py
new file mode 100644
index 0000000..adb370d
--- /dev/null
+++ b/Docker/src/prompts.py
@@ -0,0 +1,250 @@
+
+from langchain_core.messages import SystemMessage
+
+CLASSIFY_PROMPT_TEMPLATE = (
+ "\n"
+ "You are a query classifier for an AVAP language assistant. "
+ "Your only job is to classify the user message into one of three categories.\n"
+ "\n\n"
+
+ "\n"
+ "RETRIEVAL — the user is asking about AVAP concepts, documentation, syntax rules, "
+ "or how something works. They want an explanation, not code.\n"
+ "Examples: 'What is addVar?', 'How does registerEndpoint work?', "
+ "'What is the difference between if() modes?'\n\n"
+
+ "CODE_GENERATION — the user is asking to generate, write, create, build, or show "
+ "an example of an AVAP script, function, API, or code snippet. "
+ "They want working code as output.\n"
+ "Examples: 'Write an API that returns hello world', "
+ "'Generate a function that queries the DB', "
+ "'Show me how to create an endpoint', "
+ "'dame un ejemplo de codigo', 'escribeme un script', "
+ "'dime como seria un API', 'genera un API', 'como haria'\n\n"
+
+ "CONVERSATIONAL — the user is following up on the previous answer. "
+ "They want a reformulation, summary, or elaboration of what was already said.\n"
+ "Examples: 'can you explain that?', 'en menos palabras', "
+ "'describe it in your own words', 'what did you mean?'\n"
+ "\n\n"
+
+ "\n"
+ "Your entire response must be exactly one word: "
+ "RETRIEVAL, CODE_GENERATION, or CONVERSATIONAL. Nothing else.\n"
+ "\n\n"
+
+ "\n"
+ "{history}\n"
+ "\n\n"
+
+ "{message}"
+)
+
+REFORMULATE_PROMPT = SystemMessage(
+ content=(
+ "\n"
+ "You are a deterministic query rewriter whose sole purpose is to prepare "
+ "user questions for vector similarity retrieval against an AVAP language "
+ "knowledge base. You do not answer questions. You only transform phrasing "
+ "into keyword queries that will find the right AVAP documentation chunks.\n"
+ "\n\n"
+
+ "\n"
+ "Rewrite the user message into a compact keyword query for semantic search.\n\n"
+
+ "SPECIAL RULE for code generation requests:\n"
+ "When the user asks to generate/create/build/show AVAP code, expand the query "
+ "with the AVAP commands typically needed. Use this mapping:\n\n"
+
+ "- API / endpoint / route / HTTP response\n"
+ " expand to: AVAP registerEndpoint addResult _status\n\n"
+
+ "- Read input / parameter\n"
+ " expand to: AVAP addParam getQueryParamList\n\n"
+
+ "- Database / ORM / query\n"
+ " expand to: AVAP ormAccessSelect ormAccessInsert avapConnector\n\n"
+
+ "- Error handling\n"
+ " expand to: AVAP try exception end\n\n"
+
+ "- Loop / iterate\n"
+ " expand to: AVAP startLoop endLoop itemFromList getListLen\n\n"
+
+ "- HTTP request / call external\n"
+ " expand to: AVAP RequestPost RequestGet\n"
+ "\n\n"
+
+ "\n"
+ "- Preserve all AVAP identifiers verbatim.\n"
+ "- Remove filler words.\n"
+ "- Output a single line.\n"
+ "- Never answer the question.\n"
+ "\n\n"
+
+ "\n"
+ "\n"
+ "What does AVAP stand for?\n"
+ "AVAP stand for\n"
+ "\n\n"
+
+ "\n"
+ "dime como seria un API que devuelva hello world con AVAP\n"
+ "AVAP registerEndpoint addResult _status hello world example\n"
+ "\n\n"
+
+ "\n"
+ "generate an AVAP script that reads a parameter and queries the DB\n"
+ "AVAP addParam ormAccessSelect avapConnector registerEndpoint addResult\n"
+ "\n"
+ "\n\n"
+
+ "Return only the rewritten query. No labels, no prefixes, no explanation."
+ )
+)
+
+CONFIDENCE_PROMPT_TEMPLATE = (
+ "\n"
+ "You are a relevance evaluator. Decide whether the context contains "
+ "useful information to address the user question.\n"
+ "\n\n"
+
+ "\n"
+ "Answer YES if the context contains at least one relevant passage. "
+ "Answer NO only if context is empty or completely unrelated.\n"
+ "\n\n"
+
+ "\n"
+ "Exactly one word: YES or NO.\n"
+ "\n\n"
+
+ "{question}\n\n"
+ "{context}"
+)
+
+
+CODE_GENERATION_PROMPT = SystemMessage(
+ content=(
+ "\n"
+ "You are an expert AVAP programmer. AVAP (Advanced Virtual API Programming) "
+ "is a domain-specific language for orchestrating microservices and HTTP I/O. "
+ "Write correct, minimal, working AVAP code.\n"
+ "\n\n"
+
+ "\n"
+ "1. AVAP is line-oriented: every statement on a single line.\n"
+ "2. Use ONLY commands from or explicitly described in .\n"
+ "3. Do NOT copy code examples from that solve a DIFFERENT problem. "
+ "Context examples are syntax references only — ignore them if unrelated.\n"
+ "4. Write the MINIMUM code needed. No extra connectors, no unrelated variables.\n"
+ "5. Add brief inline comments explaining each part.\n"
+ "6. Answer in the same language the user used.\n"
+ "\n\n"
+
+ "\n"
+ "// Register an HTTP endpoint\n"
+ "registerEndpoint(\"GET\", \"/path\", [], \"scope\", handlerFn, \"\")\n\n"
+ "// Declare a function — uses curly braces, NOT end()\n"
+ "function handlerFn() {{\n"
+ " msg = \"Hello World\"\n"
+ " addResult(msg)\n"
+ "}}\n\n"
+ "// Assign a value to a variable\n"
+ "addVar(varName, \"value\") // or: varName = \"value\"\n\n"
+ "// Add variable to HTTP JSON response body\n"
+ "addResult(varName)\n\n"
+ "// Set HTTP response status code\n"
+ "_status = 200 // or: addVar(_status, 200)\n\n"
+ "// Read a request parameter (URL, body, or form)\n"
+ "addParam(\"paramName\", targetVar)\n\n"
+ "// Conditional\n"
+ "if(var, value, \"==\")\n"
+ " // ...\n"
+ "end()\n\n"
+ "// Loop\n"
+ "startLoop(i, 0, length)\n"
+ " // ...\n"
+ "endLoop()\n\n"
+ "// Error handling\n"
+ "try()\n"
+ " // ...\n"
+ "exception(errVar)\n"
+ " // handle\n"
+ "end()\n"
+ "\n\n"
+
+ "\n"
+ "Generate a minimal, complete AVAP example for the user's request.\n\n"
+ "Structure:\n"
+ "1. One sentence describing what the code does.\n"
+ "2. The AVAP code block — clean, minimal, with inline comments.\n"
+ "3. Two or three lines explaining the key commands used.\n"
+ "\n\n"
+
+ "\n"
+ "{context}\n"
+ ""
+ )
+)
+
+CONVERSATIONAL_PROMPT = SystemMessage(
+ content=(
+ "\n"
+ "You are a helpful AVAP assistant continuing an ongoing conversation.\n"
+ "\n\n"
+
+ "\n"
+ "The user is following up on something already discussed. "
+ "Rephrase, summarize, or elaborate using the conversation history.\n"
+ "\n\n"
+
+ "\n"
+ "- Base your answer on the conversation history.\n"
+ "- Do not introduce new AVAP facts not in the history.\n"
+ "- Keep the same language the user is using.\n"
+ "- No Answer/Evidence format. Just answer naturally.\n"
+ ""
+ )
+)
+
+
+GENERATE_PROMPT = SystemMessage(
+ content=(
+ "\n"
+ "You are a precise, retrieval-grounded assistant specialized in AVAP. "
+ "Answers are honest, calibrated to evidence, and clearly structured.\n"
+ "\n\n"
+
+ "\n"
+ "AVAP is a new proprietary language. Use ONLY content inside . "
+ "Treat any AVAP knowledge outside as unreliable.\n"
+ "\n\n"
+
+ "\n"
+ "Answer using exclusively the information in .\n"
+ "\n\n"
+
+ "\n"
+ "Step 1 — Find relevant passages in .\n"
+ "Step 2 — Assess if question can be fully or partially answered.\n"
+ "Step 3 — Write a clear answer backed by those passages.\n"
+ "Step 4 — If context contains relevant AVAP code, include it exactly.\n"
+ "\n\n"
+
+ "\n"
+ "Answer:\n"
+ "\n\n"
+
+ "Evidence:\n"
+ "- \"\"\n"
+ "(only quotes you actually used)\n\n"
+
+ "If context has no relevant information reply with exactly:\n"
+ "\"I don't have enough information in the provided context to answer that.\"\n"
+ "\n\n"
+
+ "\n"
+ "{context}\n"
+ ""
+ )
+)
\ No newline at end of file
diff --git a/Docker/src/server.py b/Docker/src/server.py
new file mode 100644
index 0000000..b233527
--- /dev/null
+++ b/Docker/src/server.py
@@ -0,0 +1,243 @@
+import logging
+import os
+from concurrent import futures
+from dotenv import load_dotenv
+load_dotenv()
+
+import brunix_pb2
+import brunix_pb2_grpc
+import grpc
+from grpc_reflection.v1alpha import reflection
+from elasticsearch import Elasticsearch
+from langchain_core.messages import AIMessage
+
+from utils.llm_factory import create_chat_model
+from utils.emb_factory import create_embedding_model
+from graph import build_graph, build_prepare_graph, build_final_messages, session_store
+
+from evaluate import run_evaluation
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger("brunix-engine")
+
+
+class BrunixEngine(brunix_pb2_grpc.AssistanceEngineServicer):
+
+ def __init__(self):
+ es_url = os.getenv("ELASTICSEARCH_URL", "http://localhost:9200")
+ es_user = os.getenv("ELASTICSEARCH_USER")
+ es_pass = os.getenv("ELASTICSEARCH_PASSWORD")
+ es_apikey = os.getenv("ELASTICSEARCH_API_KEY")
+ index = os.getenv("ELASTICSEARCH_INDEX", "avap-knowledge-v1")
+
+ self.llm = create_chat_model(
+ provider="ollama",
+ model=os.getenv("OLLAMA_MODEL_NAME"),
+ base_url=os.getenv("OLLAMA_URL"),
+ temperature=0,
+ validate_model_on_init=True,
+ )
+
+ self.embeddings = create_embedding_model(
+ provider="ollama",
+ model=os.getenv("OLLAMA_EMB_MODEL_NAME"),
+ base_url=os.getenv("OLLAMA_URL"),
+ )
+
+ es_kwargs: dict = {"hosts": [es_url], "request_timeout": 60}
+ if es_apikey:
+ es_kwargs["api_key"] = es_apikey
+ elif es_user and es_pass:
+ es_kwargs["basic_auth"] = (es_user, es_pass)
+
+ self.es_client = Elasticsearch(**es_kwargs)
+ self.index_name = index
+
+ if self.es_client.ping():
+ info = self.es_client.info()
+ logger.info(f"[ESEARCH] Connected: {info['version']['number']} — index: {index}")
+ else:
+ logger.error("[ESEARCH] Cant Connect")
+
+ self.graph = build_graph(
+ llm = self.llm,
+ embeddings = self.embeddings,
+ es_client = self.es_client,
+ index_name = self.index_name,
+ )
+
+ self.prepare_graph = build_prepare_graph(
+ llm = self.llm,
+ embeddings = self.embeddings,
+ es_client = self.es_client,
+ index_name = self.index_name,
+ )
+
+ logger.info("Brunix Engine initialized.")
+
+
+ def AskAgent(self, request, context):
+ session_id = request.session_id or "default"
+ query = request.query
+ logger.info(f"[AskAgent] session={session_id} query='{query[:80]}'")
+
+ try:
+ history = list(session_store.get(session_id, []))
+ logger.info(f"[AskAgent] conversation: {len(history)} previous messages.")
+
+ initial_state = {
+ "messages": history + [{"role": "user", "content": query}],
+ "session_id": session_id,
+ "reformulated_query": "",
+ "context": "",
+ "query_type": "",
+ }
+
+ final_state = self.graph.invoke(initial_state)
+ messages = final_state.get("messages", [])
+ last_msg = messages[-1] if messages else None
+ result_text = getattr(last_msg, "content", str(last_msg)) \
+ if last_msg else ""
+
+ logger.info(f"[AskAgent] query_type={final_state.get('query_type')} "
+ f"answer='{result_text[:100]}'")
+
+ yield brunix_pb2.AgentResponse(
+ text = result_text,
+ avap_code= "AVAP-2026",
+ is_final = True,
+ )
+
+ except Exception as e:
+ logger.error(f"[AskAgent] Error: {e}", exc_info=True)
+ yield brunix_pb2.AgentResponse(
+ text = f"[ENG] Error: {str(e)}",
+ is_final = True,
+ )
+
+
+ def AskAgentStream(self, request, context):
+ session_id = request.session_id or "default"
+ query = request.query
+ logger.info(f"[AskAgentStream] session={session_id} query='{query[:80]}'")
+
+ try:
+ history = list(session_store.get(session_id, []))
+ logger.info(f"[AskAgentStream] conversation: {len(history)} previous messages.")
+
+ initial_state = {
+ "messages": history + [{"role": "user", "content": query}],
+ "session_id": session_id,
+ "reformulated_query": "",
+ "context": "",
+ "query_type": "",
+ }
+
+ prepared = self.prepare_graph.invoke(initial_state)
+ logger.info(
+ f"[AskAgentStream] query_type={prepared.get('query_type')} "
+ f"context_len={len(prepared.get('context', ''))}"
+ )
+
+ final_messages = build_final_messages(prepared)
+ full_response = []
+
+ for chunk in self.llm.stream(final_messages):
+ token = chunk.content
+ if token:
+ full_response.append(token)
+ yield brunix_pb2.AgentResponse(
+ text = token,
+ is_final = False,
+ )
+
+ complete_text = "".join(full_response)
+ if session_id:
+ session_store[session_id] = (
+ list(prepared["messages"]) + [AIMessage(content=complete_text)]
+ )
+
+ logger.info(
+ f"[AskAgentStream] done — "
+ f"chunks={len(full_response)} total_chars={len(complete_text)}"
+ )
+
+ yield brunix_pb2.AgentResponse(text="", is_final=True)
+
+ except Exception as e:
+ logger.error(f"[AskAgentStream] Error: {e}", exc_info=True)
+ yield brunix_pb2.AgentResponse(
+ text = f"[ENG] Error: {str(e)}",
+ is_final = True,
+ )
+
+
+ def EvaluateRAG(self, request, context):
+ category = request.category or None
+ limit = request.limit or None
+ index = request.index or self.index_name
+
+ logger.info(f"[EvaluateRAG] category={category} limit={limit} index={index}")
+
+ try:
+ result = run_evaluation(
+ es_client = self.es_client,
+ llm = self.llm,
+ embeddings = self.embeddings,
+ index_name = index,
+ category = category,
+ limit = limit,
+ )
+ except Exception as e:
+ logger.error(f"[EvaluateRAG] Error: {e}", exc_info=True)
+ return brunix_pb2.EvalResponse(status=f"error: {e}")
+
+ if result.get("status") != "ok":
+ return brunix_pb2.EvalResponse(status=result.get("error", "unknown error"))
+
+ details = [
+ brunix_pb2.QuestionDetail(
+ id = d["id"],
+ category = d["category"],
+ question = d["question"],
+ answer_preview = d["answer_preview"],
+ n_chunks = d["n_chunks"],
+ )
+ for d in result.get("details", [])
+ ]
+
+ scores = result["scores"]
+ return brunix_pb2.EvalResponse(
+ status = "ok",
+ questions_evaluated = result["questions_evaluated"],
+ elapsed_seconds = result["elapsed_seconds"],
+ judge_model = result["judge_model"],
+ index = result["index"],
+ faithfulness = scores["faithfulness"],
+ answer_relevancy = scores["answer_relevancy"],
+ context_recall = scores["context_recall"],
+ context_precision = scores["context_precision"],
+ global_score = result["global_score"],
+ verdict= result["verdict"],
+ details= details,
+ )
+
+
+def serve():
+ server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
+ brunix_pb2_grpc.add_AssistanceEngineServicer_to_server(BrunixEngine(), server)
+
+ SERVICE_NAMES = (
+ brunix_pb2.DESCRIPTOR.services_by_name["AssistanceEngine"].full_name,
+ reflection.SERVICE_NAME,
+ )
+ reflection.enable_server_reflection(SERVICE_NAMES, server)
+
+ server.add_insecure_port("[::]:50051")
+ logger.info("[ENGINE] listen on 50051 (gRPC)")
+ server.start()
+ server.wait_for_termination()
+
+
+if __name__ == "__main__":
+ serve()
diff --git a/Docker/src/state.py b/Docker/src/state.py
new file mode 100644
index 0000000..1d4d0ce
--- /dev/null
+++ b/Docker/src/state.py
@@ -0,0 +1,11 @@
+# state.py
+from typing import TypedDict, Annotated
+from langgraph.graph.message import add_messages
+
+
+class AgentState(TypedDict):
+ messages: Annotated[list, add_messages]
+ reformulated_query: str
+ context: str
+ query_type: str
+ session_id: str
\ No newline at end of file
diff --git a/Docker/src/utils/emb_factory.py b/Docker/src/utils/emb_factory.py
new file mode 100644
index 0000000..d9fb9de
--- /dev/null
+++ b/Docker/src/utils/emb_factory.py
@@ -0,0 +1,67 @@
+from abc import ABC, abstractmethod
+from typing import Any, Dict
+
+
+class BaseEmbeddingFactory(ABC):
+ @abstractmethod
+ def create(self, model: str, **kwargs: Any):
+ raise NotImplementedError
+
+
+class OpenAIEmbeddingFactory(BaseEmbeddingFactory):
+ def create(self, model: str, **kwargs: Any):
+ from langchain_openai import OpenAIEmbeddings
+
+ return OpenAIEmbeddings(model=model, **kwargs)
+
+
+class OllamaEmbeddingFactory(BaseEmbeddingFactory):
+ def create(self, model: str, **kwargs: Any):
+ from langchain_ollama import OllamaEmbeddings
+
+ return OllamaEmbeddings(model=model, **kwargs)
+
+
+class BedrockEmbeddingFactory(BaseEmbeddingFactory):
+ def create(self, model: str, **kwargs: Any):
+ from langchain_aws import BedrockEmbeddings
+
+ return BedrockEmbeddings(model_id=model, **kwargs)
+
+
+class HuggingFaceEmbeddingFactory(BaseEmbeddingFactory):
+ def create(self, model: str, **kwargs: Any):
+ from langchain_huggingface import HuggingFaceEmbeddings
+
+ return HuggingFaceEmbeddings(model_name=model, **kwargs)
+
+
+EMBEDDING_FACTORIES: Dict[str, BaseEmbeddingFactory] = {
+ "openai": OpenAIEmbeddingFactory(),
+ "ollama": OllamaEmbeddingFactory(),
+ "bedrock": BedrockEmbeddingFactory(),
+ "huggingface": HuggingFaceEmbeddingFactory(),
+}
+
+
+def create_embedding_model(provider: str, model: str, **kwargs: Any):
+ """
+ Create an embedding model instance for the given provider.
+
+ Args:
+ provider: The provider name (openai, ollama, bedrock, huggingface).
+ model: The model identifier.
+ **kwargs: Additional keyword arguments passed to the model constructor.
+
+ Returns:
+ An embedding model instance.
+ """
+ key = provider.strip().lower()
+
+ if key not in EMBEDDING_FACTORIES:
+ raise ValueError(
+ f"Unsupported embedding provider: {provider}. "
+ f"Available providers: {list(EMBEDDING_FACTORIES.keys())}"
+ )
+
+ return EMBEDDING_FACTORIES[key].create(model=model, **kwargs)
diff --git a/Docker/src/utils/llm_factory.py b/Docker/src/utils/llm_factory.py
new file mode 100644
index 0000000..8b1c13c
--- /dev/null
+++ b/Docker/src/utils/llm_factory.py
@@ -0,0 +1,72 @@
+from abc import ABC, abstractmethod
+from typing import Any, Dict
+
+
+class BaseProviderFactory(ABC):
+ @abstractmethod
+ def create(self, model: str, **kwargs: Any):
+ raise NotImplementedError
+
+
+class OpenAIChatFactory(BaseProviderFactory):
+ def create(self, model: str, **kwargs: Any):
+ from langchain_openai import ChatOpenAI
+
+ return ChatOpenAI(model=model, **kwargs)
+
+
+class OllamaChatFactory(BaseProviderFactory):
+ def create(self, model: str, **kwargs: Any):
+ from langchain_ollama import ChatOllama
+
+ return ChatOllama(model=model, **kwargs)
+
+
+class BedrockChatFactory(BaseProviderFactory):
+ def create(self, model: str, **kwargs: Any):
+ from langchain_aws import ChatBedrockConverse
+
+ return ChatBedrockConverse(model=model, **kwargs)
+
+
+class HuggingFaceChatFactory(BaseProviderFactory):
+ def create(self, model: str, **kwargs: Any):
+ from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
+
+ llm = HuggingFacePipeline.from_model_id(
+ model_id=model,
+ task="text-generation",
+ pipeline_kwargs=kwargs,
+ )
+ return ChatHuggingFace(llm=llm)
+
+
+CHAT_FACTORIES: Dict[str, BaseProviderFactory] = {
+ "openai": OpenAIChatFactory(),
+ "ollama": OllamaChatFactory(),
+ "bedrock": BedrockChatFactory(),
+ "huggingface": HuggingFaceChatFactory(),
+}
+
+
+def create_chat_model(provider: str, model: str, **kwargs: Any):
+ """
+ Create a chat model instance for the given provider.
+
+ Args:
+ provider: The provider name (openai, ollama, bedrock, huggingface).
+ model: The model identifier.
+ **kwargs: Additional keyword arguments passed to the model constructor.
+
+ Returns:
+ A chat model instance.
+ """
+ key = provider.strip().lower()
+
+ if key not in CHAT_FACTORIES:
+ raise ValueError(
+ f"Unsupported chat provider: {provider}. "
+ f"Available providers: {list(CHAT_FACTORIES.keys())}"
+ )
+
+ return CHAT_FACTORIES[key].create(model=model, **kwargs)
diff --git a/README.md b/README.md
index fb50632..8931418 100644
--- a/README.md
+++ b/README.md
@@ -8,141 +8,639 @@ This project is a strategic joint development:
---
-## System Architecture
+## System Architecture (Hybrid Dev Mode)
-The following diagram illustrates the interaction between the AVAP technology, the trained intelligence, and the infrastructure components:
+The engine runs locally for development but connects to the production-grade infrastructure in the **Vultr Cloud (Devaron Cluster)** via secure `kubectl` tunnels.
```mermaid
graph TD
- subgraph Client_Layer [External Interface]
- Client[External Services / UI]
+ subgraph Local_Workstation [Developer]
+ BE[Brunix Assistance Engine - Docker]
+ KT[Kubectl Port-Forward Tunnels]
end
- subgraph Engine_Layer
- BE[Brunix Assistance Engine]
- LG[LangGraph Logic]
- LC[LangChain Framework]
- end
-
- subgraph Intelligence_Layer
- LLM[Fine-tuned Model / OpenAI or other]
- Prompt[Prompt Engineering]
- end
-
- subgraph Data_Observability_Layer [System Support]
+ subgraph Vultr_K8s_Cluster [Production - Devaron Cluster]
+ OL[Ollama Light Service - LLM]
EDB[(Elasticsearch Vector DB)]
- LF[Langfuse Observability]
- PG[(Postgres - System Data)]
+ PG[(Postgres - Langfuse Data)]
+ LF[Langfuse UI - Web]
end
- Client -- gRPC:50052 --> BE
- BE --> LG
- LG --> LC
- LC --> LLM
- LLM --> Prompt
- LC -- Semantic Search --> EDB
- LC -- Tracing/Metrics --> LF
- LF -- Persistence --> PG
+ BE -- localhost:11434 --> KT
+ BE -- localhost:9200 --> KT
+ BE -- localhost:5432 --> KT
+
+ KT -- Secure Link --> OL
+ KT -- Secure Link --> EDB
+ KT -- Secure Link --> PG
+
+ Developer -- Browser --> LF
```
---
-## Technology Stack
+## Project Structure
-* **Logic Layer:** [LangChain](https://www.langchain.com/) & [LangGraph](https://langchain-ai.github.io/langgraph/) (Python 3.11).
-* **Communication:** [gRPC](https://grpc.io/) (High-performance, low-latency RPC framework).
-* **Vector Database:** [Elasticsearch 8.12](https://www.elastic.co/) (For semantic search and AVAP data retrieval).
-* **Observability:** [Langfuse](https://langfuse.com/) (End-to-end tracing, latency monitoring, and cost management).
-* **Infrastructure:** Dockerized environment with PostgreSQL 15 persistence.
+```text
+├── README.md # Setup guide & dev reference (this file)
+├── CONTRIBUTING.md # Contribution standards, GitFlow, PR process
+├── SECURITY.md # Security policy and vulnerability reporting
+├── changelog # Version tracking and release history
+├── pyproject.toml # Python project configuration (uv)
+├── uv.lock # Locked dependency graph
+│
+├── Docker/ # Production container
+│ ├── protos/
+│ │ └── brunix.proto # gRPC API contract (source of truth)
+│ ├── src/
+│ │ ├── server.py # gRPC server — AskAgent, AskAgentStream, EvaluateRAG
+│ │ ├── openai_proxy.py # OpenAI & Ollama-compatible HTTP proxy (port 8000)
+│ │ ├── graph.py # LangGraph orchestration — build_graph, build_prepare_graph
+│ │ ├── prompts.py # Centralized prompt definitions (CLASSIFY, GENERATE, etc.)
+│ │ ├── state.py # AgentState TypedDict (shared across graph nodes)
+│ │ ├── evaluate.py # RAGAS evaluation pipeline (Claude as judge)
+│ │ ├── golden_dataset.json # Ground-truth Q&A dataset for EvaluateRAG
+│ │ └── utils/
+│ │ ├── emb_factory.py # Provider-agnostic embedding model factory
+│ │ └── llm_factory.py # Provider-agnostic LLM factory
+│ ├── Dockerfile # Multi-stage container build
+│ ├── docker-compose.yaml # Local dev orchestration
+│ ├── entrypoint.sh # Starts gRPC server + HTTP proxy in parallel
+│ ├── requirements.txt # Pinned production dependencies (exported by uv)
+│ ├── .env # Local secrets (never commit — see .gitignore)
+│ └── .dockerignore # Excludes dev artifacts from image build context
+│
+├── docs/ # Knowledge base & project documentation
+│ ├── ARCHITECTURE.md # Deep technical architecture reference
+│ ├── API_REFERENCE.md # Complete gRPC & HTTP API contract with examples
+│ ├── RUNBOOK.md # Operational playbooks and incident response
+│ ├── AVAP_CHUNKER_CONFIG.md # avap_config.json reference — blocks, statements, semantic tags
+│ ├── adr/ # Architecture Decision Records
+│ │ ├── ADR-0001-grpc-primary-interface.md
+│ │ ├── ADR-0002-two-phase-streaming.md
+│ │ ├── ADR-0003-hybrid-retrieval-rrf.md
+│ │ └── ADR-0004-claude-eval-judge.md
+│ ├── avap_language_github_docs/ # AVAP language reference docs (GitHub source)
+│ ├── developer.avapframework.com/ # AVAP developer portal docs
+│ ├── LRM/
+│ │ └── avap.md # AVAP Language Reference Manual (LRM)
+│ └── samples/ # AVAP code samples (.avap) used for ingestion
+│
+├── ingestion/
+│ └── chunks.json # Last export of ingested chunks (ES bulk output)
+│
+├── scripts/
+│ └── pipelines/
+│ │
+│ ├── flows/ # Executable pipeline entry points (Typer CLI)
+│ │ ├── elasticsearch_ingestion.py # [PIPELINE A] Chonkie-based ingestion flow
+│ │ ├── generate_mbap.py # Synthetic MBPP-AVAP dataset generator (Claude)
+│ │ └── translate_mbpp.py # MBPP→AVAP dataset translation pipeline
+│ │
+│ ├── tasks/ # Reusable task modules for Pipeline A
+│ │ ├── chunk.py # Document fetching, Chonkie chunking & ES bulk write
+│ │ ├── embeddings.py # OllamaEmbeddings adapter (Chonkie-compatible)
+│ │ └── prompts.py # Prompt templates for pipeline LLM calls
+│ │
+│ └── ingestion/ # [PIPELINE B] AVAP-native classic ingestion
+│ ├── avap_chunker.py # Custom AVAP lexer + chunker (MinHash dedup, overlaps)
+│ ├── avap_ingestor.py # Async ES ingestor with DLQ (producer/consumer pattern)
+│ ├── avap_config.json # AVAP language config (blocks, statements, semantic tags)
+│ └── ingestion/
+│ └── chunks.jsonl # JSONL output from avap_chunker.py
+│
+└── src/ # Shared library (used by both Docker and scripts)
+ ├── config.py # Pydantic settings — reads all environment variables
+ └── utils/
+ ├── emb_factory.py # Embedding model factory
+ └── llm_factory.py # LLM model factory
+```
---
-## Getting Started
+## Data Flow & RAG Orchestration
-### Prerequisites
-* Docker & Docker Compose
-* OpenAI API Key (or configured local provider)
-
-### Installation & Deployment
-
-1. **Clone the repository:**
- ```bash
- git clone git@github.com:BRUNIX-AI/assistance-engine.git
- cd assistance-engine
- ```
-
-2. **Configure Environment Variables:**
- Create a `.env` file in the root directory:
- ```env
- OPENAI_API_KEY=your_key_here
- LANGFUSE_PUBLIC_KEY=pk-lf-...
- LANGFUSE_SECRET_KEY=sk-lf-...
- LANGFUSE_HOST=http://langfuse:3000
- ```
-
-3. **Launch the Stack:**
- ```bash
- docker-compose up -d --build
- ```
-
-The engine will be listening for gRPC requests on port `50052`.
-
----
-
-## Component Overview
-
-| Service | Container Name | Description | Role |
-| :--- | :--- | :--- | :--- |
-| **Engine** | `brunix-assistance-engine` | The AVAP-powered brain. | Engine |
-| **Vector DB** | `brunix-vector-db` | Elasticsearch instance (Knowledge Base). | Training Support |
-| **Observability** | `brunix-observability` | Langfuse UI (Tracing & Costs). | System Quality |
-| **System DB** | `brunix-postgres` | Internal storage for Langfuse. | Infrastructure |
-
----
-
-## Partnership & Contributions
-
-This repository is private and represents the intellectual property of **101OBEX Corp** and **MrHouston**.
-
-* **Architecture & AVAP:** Managed by 101OBEX Engineering.
-* **Model Training & Intelligence:** Managed by MrHouston Data Science Team.
-
----
-
-## Open Source & Intellectual Property
-
-The Brunix Assistance Engine is built on a hybrid architecture that balances the flexibility of open-source tools with the security of proprietary intelligence:
-
-* **Open Source Frameworks:** Utilizes **LangChain** and **LangGraph** (MIT License) for orchestration, and **gRPC** for high-performance communication.
-* **Infrastructure:** Deploys via **Docker** using **PostgreSQL** and **Elasticsearch** (Elastic License 2.0).
-* **Proprietary Logic:** The **AVAP Technology** (101OBEX Corp) and the specific **Model Training/Prompts** (MrHouston) are protected intellectual property.
-* **LLM Provider:** Currently configured for **OpenAI** (Proprietary SaaS). The modular design allows for future integration with locally-hosted Open Source models (e.g., Llama 3, Mistral) to ensure 100% data sovereignty if required.
-
-## Security & Privacy
-
-The system is designed with a "Security-First" approach to protect corporate intelligence:
-
-1. **Data in Transit:** Communication between the Engine and external clients is handled via **gRPC**, supporting **TLS/SSL encryption** to ensure that data remains private and tamper-proof.
-2. **Internal Networking:** All database interactions (Elasticsearch, PostgreSQL) occur within a **private Docker bridge network** (`avap-network`), isolated from the public internet.
-3. **Observability Governance:** **Langfuse** provides a full audit trail of every LLM interaction, allowing for real-time monitoring of data leakage or unexpected model behavior.
-4. **Enterprise Secret Management:** While local development uses `.env` files, the architecture is **Production-Ready for Kubernetes**. In production environments, sensitive credentials (API Keys, Database passwords) are managed via **Kubernetes Secrets** or **HashiCorp Vault**, ensuring that no sensitive data is stored within the container images or source control.
+The following diagram illustrates the sequence of a single `AskAgent` request, detailing the retrieval and generation phases through the secure tunnel.
```mermaid
-graph LR
- subgraph Public_Internet
- Client[External Client]
- end
- subgraph Encrypted_Tunnel [TLS/SSL]
- gRPC[gRPC Protocol]
- end
- subgraph K8s_Cluster [Production Environment]
- Engine[Brunix Engine]
- Sec{{"Kubernetes Secrets"}}
- DB[(Databases)]
+sequenceDiagram
+ participant U as External Client (gRPCurl/App)
+ participant E as Brunix Engine (Local Docker)
+ participant T as Kubectl Tunnel
+ participant V as Vector DB (Vultr)
+ participant O as Ollama Light (Vultr)
+
+ U->>E: AskAgent(query, session_id)
+ Note over E: Start Langfuse Trace
+
+ E->>T: Search Context (Embeddings)
+ T->>V: Query Index [avap_manuals]
+ V-->>T: Return Relevant Chunks
+ T-->>E: Contextual Data
+
+ E->>T: Generate Completion (Prompt + Context)
+ T->>O: Stream Tokens (qwen2.5:1.5b)
+
+ loop Token Streaming
+ O-->>T: Token
+ T-->>E: Token
+ E-->>U: gRPC Stream Response {text, avap_code}
end
- Client --> gRPC
- gRPC --> Engine
- Sec -.->|Injected as Env| Engine
- Engine <--> DB
+ Note over E: Close Langfuse Trace
```
+
+---
+
+## Knowledge Base Ingestion
+
+The Elasticsearch vector index is populated via one of two independent pipelines. Both pipelines require the Elasticsearch tunnel to be active (`localhost:9200`) and the Ollama embedding model (`OLLAMA_EMB_MODEL_NAME`) to be available.
+
+### Pipeline A — Chonkie (recommended for markdown + .avap)
+
+Uses the [Chonkie](https://github.com/chonkie-ai/chonkie) library for semantic chunking. Supports `.md` (via `MarkdownChef`) and `.avap` (via `TextChef` + `TokenChunker`). Chunks are embedded with Ollama and bulk-indexed into Elasticsearch via `ElasticHandshakeWithMetadata`.
+
+**Entry point:** `scripts/pipelines/flows/elasticsearch_ingestion.py`
+
+```bash
+# Index all markdown and AVAP files from docs/LRM
+python -m scripts.pipelines.flows.elasticsearch_ingestion \
+ --docs-folder-path docs/LRM \
+ --output ingestion/chunks.json \
+ --docs-extension .md .avap \
+ --es-index avap-docs-test \
+ --delete-es-index
+
+# Index the AVAP code samples
+python -m scripts.pipelines.flows.elasticsearch_ingestion \
+ --docs-folder-path docs/samples \
+ --output ingestion/chunks.json \
+ --docs-extension .avap \
+ --es-index avap-docs-test
+```
+
+**How it works:**
+
+```
+docs/**/*.md + docs/**/*.avap
+ │
+ ▼ FileFetcher (Chonkie)
+ │
+ ├─ .md → MarkdownChef → merge code blocks + tables into chunks
+ │ ↓
+ │ TokenChunker (HuggingFace tokenizer: HF_EMB_MODEL_NAME)
+ │
+ └─ .avap → TextChef → TokenChunker
+ │
+ ▼ OllamaEmbeddings.embed_batch() (OLLAMA_EMB_MODEL_NAME)
+ │
+ ▼ ElasticHandshakeWithMetadata.write()
+ bulk index → {text, embedding, file, start_index, end_index, token_count}
+ │
+ ▼ export_documents() → ingestion/chunks.json
+```
+
+| Chunk field | Source |
+|---|---|
+| `text` | Raw chunk text |
+| `embedding` | Ollama dense vector |
+| `start_index` / `end_index` | Character offsets in source file |
+| `token_count` | HuggingFace tokenizer count |
+| `file` | Source filename |
+
+---
+
+### Pipeline B — AVAP Native (classic, for .avap files with full semantic analysis)
+
+A custom lexer-based chunker purpose-built for the AVAP language using `avap_config.json` as its grammar definition. Produces richer metadata (block type, section, semantic tags, complexity score) and includes **MinHash LSH deduplication** and **semantic overlap** between chunks.
+
+**Entry point:** `scripts/pipelines/ingestion/avap_chunker.py`
+**Grammar config:** `scripts/pipelines/ingestion/avap_config.json` — see [`docs/AVAP_CHUNKER_CONFIG.md`](./docs/AVAP_CHUNKER_CONFIG.md) for the full reference on blocks, statements, semantic tags, and how to extend the grammar.
+
+```bash
+python scripts/pipelines/ingestion/avap_chunker.py \
+ --lang-config scripts/pipelines/ingestion/avap_config.json \
+ --docs-path docs/samples \
+ --output scripts/pipelines/ingestion/ingestion/chunks.jsonl \
+ --workers 4
+```
+
+**Step 2 — Ingest:** `scripts/pipelines/ingestion/avap_ingestor.py`
+
+```bash
+# Ingest from existing JSONL
+python scripts/pipelines/ingestion/avap_ingestor.py \
+ --chunks scripts/pipelines/ingestion/ingestion/chunks.jsonl \
+ --index avap-knowledge-v1 \
+ --delete
+
+# Check model embedding dimensions first
+python scripts/pipelines/ingestion/avap_ingestor.py --probe-dim
+```
+
+**How it works:**
+
+```
+docs/**/*.avap + docs/**/*.md
+ │
+ ▼ avap_chunker.py (GenericLexer + LanguageConfig)
+ │ ├─ .avap: block detection (function/if/startLoop/try), statement classification
+ │ │ semantic tags enrichment, function signature extraction
+ │ │ semantic overlap injection (OVERLAP_LINES=3)
+ │ └─ .md: H1/H2/H3 sectioning, fenced code extraction, table isolation,
+ │ narrative split by token budget (MAX_NARRATIVE_TOKENS=400)
+ │ ├─ MinHash LSH deduplication (threshold=0.85, 128 permutations)
+ │ └─ parallel workers (ProcessPoolExecutor)
+ │
+ ▼ chunks.jsonl (one JSON per line)
+ │
+ ▼ avap_ingestor.py (async producer/consumer)
+ │ ├─ OllamaAsyncEmbedder — batch embed (BATCH_SIZE_EMBED=8)
+ │ ├─ asyncio.Queue (backpressure, QUEUE_MAXSIZE=5)
+ │ ├─ ES async_bulk (BATCH_SIZE_ES=50)
+ │ └─ DeadLetterQueue — failed chunks saved to failed_chunks_.jsonl
+ │
+ ▼ Elasticsearch index
+ {chunk_id, content, embedding, doc_type, block_type, section,
+ source_file, start_line, end_line, token_estimate, metadata{...}}
+```
+
+**Chunk types produced:**
+
+| `doc_type` | `block_type` | Description |
+|---|---|---|
+| `code` | `function` | Complete AVAP function block |
+| `code` | `if` / `startLoop` / `try` | Control flow blocks |
+| `function_signature` | `function_signature` | Extracted function signature only (for fast lookup) |
+| `code` | `registerEndpoint` / `addVar` / … | Statement-level chunks by AVAP command category |
+| `spec` | `narrative` | Markdown prose sections |
+| `code_example` | language tag | Fenced code blocks from markdown |
+| `bnf` | `bnf` | BNF grammar blocks from markdown |
+| `spec` | `table` | Markdown tables |
+
+**Semantic tags** (automatically detected, stored in `metadata`):
+
+`uses_orm` · `uses_http` · `uses_connector` · `uses_async` · `uses_crypto` · `uses_auth` · `uses_error_handling` · `uses_loop` · `uses_json` · `uses_list` · `uses_regex` · `uses_datetime` · `returns_result` · `registers_endpoint`
+
+**Ingestor environment variables:**
+
+| Variable | Default | Description |
+|---|---|---|
+| `OLLAMA_URL` | `http://localhost:11434` | Ollama base URL for embeddings |
+| `OLLAMA_MODEL` | `qwen3-0.6B-emb:latest` | Embedding model name |
+| `OLLAMA_EMBEDDING_DIM` | `1024` | Expected embedding dimension (must match model) |
+
+---
+
+## Development Setup
+
+### 1. Prerequisites
+* **Docker & Docker Compose**
+* **gRPCurl** (`brew install grpcurl`)
+* **Access Credentials:** Ensure the file `./ivar.yaml` (Kubeconfig) is present in the root directory.
+
+### 2. Observability Setup (Langfuse)
+The engine utilizes Langfuse for end-to-end tracing and performance monitoring.
+1. Access the Dashboard: **http://45.77.119.180**
+2. Create a project and generate API Keys in **Settings**.
+3. Configure your local `.env` file using the reference table below.
+
+### 3. Environment Variables Reference
+
+> **Policy:** Every environment variable used by the engine must be documented in this table. Any PR that introduces a new variable without a corresponding entry here will be rejected. See [CONTRIBUTING.md](./CONTRIBUTING.md#5-environment-variables-policy) for full details.
+
+Create a `.env` file in the project root with the following variables:
+
+```env
+PYTHONPATH=${PYTHONPATH}:/home/...
+ELASTICSEARCH_URL=http://host.docker.internal:9200
+ELASTICSEARCH_LOCAL_URL=http://localhost:9200
+ELASTICSEARCH_INDEX=avap-docs-test
+ELASTICSEARCH_USER=elastic
+ELASTICSEARCH_PASSWORD=changeme
+ELASTICSEARCH_API_KEY=
+POSTGRES_URL=postgresql://postgres:postgres@localhost:5432/langfuse
+LANGFUSE_HOST=http://45.77.119.180
+LANGFUSE_PUBLIC_KEY=pk-lf-...
+LANGFUSE_SECRET_KEY=sk-lf-...
+OLLAMA_URL=http://host.docker.internal:11434
+OLLAMA_LOCAL_URL=http://localhost:11434
+OLLAMA_MODEL_NAME=qwen2.5:1.5b
+OLLAMA_EMB_MODEL_NAME=qwen3-0.6B-emb:latest
+HF_TOKEN=hf_...
+HF_EMB_MODEL_NAME=Qwen/Qwen3-Embedding-0.6B
+ANTHROPIC_API_KEY=sk-ant-...
+ANTHROPIC_MODEL=claude-sonnet-4-20250514
+```
+
+| Variable | Required | Description | Example |
+|---|---|---|---|
+| `PYTHONPATH` | No | Path that aims to the root of the project | `${PYTHONPATH}:/home/...` |
+| `ELASTICSEARCH_URL` | Yes | Elasticsearch endpoint used for vector/context retrieval in Docker | `http://host.docker.internal:9200` |
+| `ELASTICSEARCH_LOCAL_URL` | Yes | Elasticsearch endpoint used for vector/context retrieval in local | `http://localhost:9200` |
+| `ELASTICSEARCH_INDEX` | Yes | Elasticsearch index name used by the engine | `avap-docs-test` |
+| `ELASTICSEARCH_USER` | No | Elasticsearch username (used when API key is not set) | `elastic` |
+| `ELASTICSEARCH_PASSWORD` | No | Elasticsearch password (used when API key is not set) | `changeme` |
+| `ELASTICSEARCH_API_KEY` | No | Elasticsearch API key (takes precedence over user/password auth) | `abc123...` |
+| `POSTGRES_URL` | Yes | PostgreSQL connection string used by the service | `postgresql://postgres:postgres@localhost:5432/langfuse` |
+| `LANGFUSE_HOST` | Yes | Langfuse server endpoint (Devaron Cluster) | `http://45.77.119.180` |
+| `LANGFUSE_PUBLIC_KEY` | Yes | Langfuse project public key for tracing and observability | `pk-lf-...` |
+| `LANGFUSE_SECRET_KEY` | Yes | Langfuse project secret key | `sk-lf-...` |
+| `OLLAMA_URL` | Yes | Ollama endpoint used for text generation/embeddings in Docker | `http://host.docker.internal:11434` |
+| `OLLAMA_LOCAL_URL` | Yes | Ollama endpoint used for text generation/embeddings in local | `http://localhost:11434` |
+| `OLLAMA_MODEL_NAME` | Yes | Ollama model name for generation | `qwen2.5:1.5b` |
+| `OLLAMA_EMB_MODEL_NAME` | Yes | Ollama embeddings model name | `qwen3-0.6B-emb:latest` |
+| `HF_TOKEN` | Yes | HuggingFace secret token | `hf_...` |
+| `HF_EMB_MODEL_NAME` | Yes | HuggingFace embeddings model name | `Qwen/Qwen3-Embedding-0.6B` |
+| `ANTHROPIC_API_KEY` | Yes* | Anthropic API key — required for the `EvaluateRAG` endpoint | `sk-ant-...` |
+| `ANTHROPIC_MODEL` | No | Claude model used by the RAG evaluation suite | `claude-sonnet-4-20250514` |
+
+> Never commit real secret values. Use placeholder values when sharing configuration examples.
+
+### 4. Infrastructure Tunnels
+Open a terminal and establish the connection to the Devaron Cluster:
+
+```bash
+# 1. AI Model Tunnel (Ollama)
+kubectl port-forward --address 0.0.0.0 svc/ollama-light-service 11434:11434 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &
+
+# 2. Knowledge Base Tunnel (Elasticsearch)
+kubectl port-forward --address 0.0.0.0 svc/brunix-vector-db 9200:9200 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &
+
+# 3. Observability DB Tunnel (PostgreSQL)
+kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml &
+```
+
+### 5. Launch the Engine
+```bash
+docker-compose up -d --build
+```
+
+---
+
+## Testing & Debugging
+
+The gRPC service is exposed on port `50052` with **gRPC Reflection** enabled — introspect it at any time without needing the `.proto` file.
+
+```bash
+# List available services
+grpcurl -plaintext localhost:50052 list
+
+# Describe the full service contract
+grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
+```
+
+### `AskAgent` — complete response (non-streaming)
+
+Returns the full answer as a single message with `is_final: true`. Suitable for clients that do not support streaming.
+
+```bash
+grpcurl -plaintext \
+ -d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
+ localhost:50052 \
+ brunix.AssistanceEngine/AskAgent
+```
+
+Expected response:
+```json
+{
+ "text": "addVar is an AVAP command used to declare a variable...",
+ "avap_code": "AVAP-2026",
+ "is_final": true
+}
+```
+
+### `AskAgentStream` — real token streaming
+
+Emits one `AgentResponse` per token from Ollama. The final message has `is_final: true` and empty `text` — it is a termination signal, not part of the answer.
+
+```bash
+grpcurl -plaintext \
+ -d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
+ localhost:50052 \
+ brunix.AssistanceEngine/AskAgentStream
+```
+
+Expected response stream:
+```json
+{"text": "Here", "is_final": false}
+{"text": " is", "is_final": false}
+...
+{"text": "", "is_final": true}
+```
+
+**Multi-turn conversation:** send subsequent requests with the same `session_id` to maintain context.
+
+```bash
+# Turn 1
+grpcurl -plaintext \
+ -d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
+ localhost:50052 brunix.AssistanceEngine/AskAgentStream
+
+# Turn 2 — engine has Turn 1 history
+grpcurl -plaintext \
+ -d '{"query": "Show me a code example", "session_id": "user-abc"}' \
+ localhost:50052 brunix.AssistanceEngine/AskAgentStream
+```
+
+### `EvaluateRAG` — quality evaluation
+
+Runs the RAGAS evaluation pipeline against the golden dataset using Claude as the judge. Requires `ANTHROPIC_API_KEY` to be set.
+
+```bash
+# Full evaluation
+grpcurl -plaintext -d '{}' localhost:50052 brunix.AssistanceEngine/EvaluateRAG
+
+# Filtered: first 10 questions of category "core_syntax"
+grpcurl -plaintext \
+ -d '{"category": "core_syntax", "limit": 10, "index": "avap-docs-test"}' \
+ localhost:50052 \
+ brunix.AssistanceEngine/EvaluateRAG
+```
+
+Expected response:
+```json
+{
+ "status": "ok",
+ "questions_evaluated": 10,
+ "elapsed_seconds": 142.3,
+ "judge_model": "claude-sonnet-4-20250514",
+ "faithfulness": 0.8421,
+ "answer_relevancy": 0.7913,
+ "context_recall": 0.7234,
+ "context_precision": 0.6891,
+ "global_score": 0.7615,
+ "verdict": "ACCEPTABLE"
+}
+```
+
+Verdict thresholds: `EXCELLENT` ≥ 0.80 · `ACCEPTABLE` ≥ 0.60 · `INSUFFICIENT` < 0.60
+
+---
+
+## HTTP Proxy (OpenAI & Ollama Compatible)
+
+The container also runs an **OpenAI-compatible HTTP proxy** on port `8000` (`openai_proxy.py`). It wraps the gRPC engine transparently — `stream: false` routes to `AskAgent`, `stream: true` routes to `AskAgentStream`.
+
+This enables integration with any tool that supports the OpenAI or Ollama API (continue.dev, LiteLLM, Open WebUI, etc.) without code changes.
+
+### OpenAI endpoints
+
+| Method | Endpoint | Description |
+|---|---|---|
+| `GET` | `/v1/models` | List available models |
+| `POST` | `/v1/chat/completions` | Chat completion — streaming and non-streaming |
+| `POST` | `/v1/completions` | Legacy text completion — streaming and non-streaming |
+| `GET` | `/health` | Health check — returns gRPC target and status |
+
+**Non-streaming chat:**
+```bash
+curl http://localhost:8000/v1/chat/completions \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "brunix",
+ "messages": [{"role": "user", "content": "What is AVAP?"}],
+ "stream": false
+ }'
+```
+
+**Streaming chat (SSE):**
+```bash
+curl http://localhost:8000/v1/chat/completions \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "brunix",
+ "messages": [{"role": "user", "content": "Write an AVAP hello world API"}],
+ "stream": true,
+ "session_id": "user-xyz"
+ }'
+```
+
+> **Brunix extension:** `session_id` is a non-standard field added to the OpenAI schema. Use it to maintain multi-turn conversation context across HTTP requests. If omitted, all requests share the `"default"` session.
+
+### Ollama endpoints
+
+| Method | Endpoint | Description |
+|---|---|---|
+| `GET` | `/api/tags` | List models (Ollama format) |
+| `POST` | `/api/chat` | Chat — NDJSON stream, `stream: true` by default |
+| `POST` | `/api/generate` | Text generation — NDJSON stream, `stream: true` by default |
+
+```bash
+curl http://localhost:8000/api/chat \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "brunix",
+ "messages": [{"role": "user", "content": "Explain AVAP loops"}],
+ "stream": true
+ }'
+```
+
+### Proxy environment variables
+
+| Variable | Default | Description |
+|---|---|---|
+| `BRUNIX_GRPC_TARGET` | `localhost:50051` | gRPC engine address the proxy connects to |
+| `PROXY_MODEL_ID` | `brunix` | Model name returned in API responses |
+| `PROXY_THREAD_WORKERS` | `20` | Thread pool size for concurrent gRPC calls |
+
+---
+
+## API Contract (Protobuf)
+
+The source of truth for the gRPC interface is `Docker/protos/brunix.proto`. After modifying it, regenerate the stubs:
+
+```bash
+python -m grpc_tools.protoc \
+ -I./Docker/protos \
+ --python_out=./Docker/src \
+ --grpc_python_out=./Docker/src \
+ ./Docker/protos/brunix.proto
+```
+
+For the full API reference — message types, field descriptions, error handling, and all client examples — see [`docs/API_REFERENCE.md`](./docs/API_REFERENCE.md).
+
+---
+
+## Dataset Generation & Evaluation
+
+The engine includes a specialized benchmarking suite to evaluate the model's proficiency in **AVAP syntax**. This is achieved through a synthetic data generator that creates problems in the MBPP (Mostly Basic Python Problems) style, but tailored for the AVAP Language Reference Manual (LRM).
+
+### 1. Synthetic Data Generator
+The script `scripts/pipelines/flows/generate_mbap.py` leverages Claude to produce high-quality, executable code examples and validation tests.
+
+**Key Features:**
+* **LRM Grounding:** Uses the provided `avap.md` as the source of truth for syntax and logic.
+* **Validation Logic:** Generates `test_list` with Python regex assertions to verify the state of the AVAP stack after execution.
+* **Balanced Categories:** Covers 14 domains including ORM, Concurrency (`go/gather`), HTTP handling, and Cryptography.
+
+### 2. Usage
+Ensure you have the `anthropic` library installed and your API key configured:
+
+```bash
+pip install anthropic
+export ANTHROPIC_API_KEY="your-sk-ant-key"
+```
+
+Run the generator specifying the path to your LRM and the desired output:
+
+```bash
+python scripts/pipelines/flows/generate_mbap.py \
+ --lrm docs/LRM/avap.md \
+ --output evaluation/mbpp_avap.json \
+ --problems 300
+```
+
+### 3. Dataset Schema
+The generated JSON follows this structure:
+
+| Field | Type | Description |
+| :--- | :--- | :--- |
+| `task_id` | Integer | Unique identifier for the benchmark. |
+| `text` | String | Natural language description of the problem (Spanish). |
+| `code` | String | The reference AVAP implementation. |
+| `test_list` | Array | Python `re.match` expressions to validate execution results. |
+
+### 4. Integration in RAG
+These generated examples are used to:
+1. **Fine-tune** the local models (`qwen2.5:1.5b`) or others via the MrHouston pipeline.
+2. **Evaluate** the "Zero-Shot" performance of the engine before deployment.
+3. **Provide Few-Shot examples** in the RAG prompt orchestration (`src/prompts.py`).
+
+---
+
+## Repository Standards & Architecture
+
+### Docker & Build Context
+To maintain production-grade security and image efficiency, this project enforces a strict separation between development files and the production runtime:
+
+* **Production Root:** All executable code must reside in the `/app` directory within the container.
+* **Exclusions:** The root `/workspace` directory is deprecated. No development artifacts, local logs, or non-essential source files (e.g., `.git`, `tests/`, `docs/`) should be bundled into the final image.
+* **Compliance:** All Pull Requests must verify that the `Dockerfile` context is optimized using the provided `.dockerignore`.
+
+*Failure to comply with these architectural standards will result in PR rejection.*
+
+For the full set of contribution standards, see [CONTRIBUTING.md](./CONTRIBUTING.md).
+
+---
+
+## Documentation Index
+
+| Document | Purpose |
+|---|---|
+| [README.md](./README.md) | Setup guide, env vars reference, quick start (this file) |
+| [CONTRIBUTING.md](./CONTRIBUTING.md) | Contribution standards, GitFlow, PR process |
+| [SECURITY.md](./SECURITY.md) | Security policy, vulnerability reporting, known limitations |
+| [docs/ARCHITECTURE.md](./docs/ARCHITECTURE.md) | Deep technical architecture, component inventory, data flows |
+| [docs/API_REFERENCE.md](./docs/API_REFERENCE.md) | Complete gRPC API contract, message types, client examples |
+| [docs/RUNBOOK.md](./docs/RUNBOOK.md) | Operational playbooks, health checks, incident response |
+| [docs/AVAP_CHUNKER_CONFIG.md](./docs/AVAP_CHUNKER_CONFIG.md) | `avap_config.json` reference — blocks, statements, semantic tags, how to extend |
+| [docs/adr/](./docs/adr/) | Architecture Decision Records |
+
+---
+
+## Security & Intellectual Property
+* **Data Privacy:** All LLM processing and vector searches are conducted within a private Kubernetes environment.
+* **Proprietary Technology:** This repository contains the **AVAP Technology** stack (101OBEX) and specialized training logic (MrHouston). Unauthorized distribution is prohibited.
+
+---
diff --git a/changelog b/changelog
index 8b798ff..2b55fff 100644
--- a/changelog
+++ b/changelog
@@ -4,14 +4,115 @@ All notable changes to the **Brunix Assistance Engine** will be documented in th
---
+## [1.5.1] - 2026-03-18
+
+### Added
+- DOCS: Created `docs/ARCHITECTURE.md` — full technical architecture reference covering component inventory, request lifecycle, LangGraph workflow, hybrid RAG pipeline, streaming design, evaluation pipeline, infrastructure layout, session memory, observability, and security boundaries.
+- DOCS: Created `docs/API_REFERENCE.md` — complete gRPC API contract documentation with method descriptions, message type tables, error handling, and `grpcurl` client examples for all three RPCs (`AskAgent`, `AskAgentStream`, `EvaluateRAG`).
+- DOCS: Created `docs/RUNBOOK.md` — operational playbook with health checks, startup/shutdown procedures, tunnel management, and incident playbooks for all known failure modes.
+- DOCS: Created `SECURITY.md` — security policy covering transport security, authentication, secrets management, container security, data privacy, known limitations table, and vulnerability reporting process.
+- DOCS: Created `docs/AVAP_CHUNKER_CONFIG.md` — full reference for `avap_config.json`: lexer fields, all 4 block definitions with regex breakdown, all 10 statement categories with ordering rationale, all 14 semantic tags with detection patterns, a worked example showing chunks produced from real AVAP code, and a step-by-step guide for adding new constructs.
+
+### Changed
+- DOCS: Fully rewrote `README.md` project structure tree — now reflects all files accurately including `openai_proxy.py`, `entrypoint.sh`, `golden_dataset.json`, `SECURITY.md`, `docs/ARCHITECTURE.md`, `docs/API_REFERENCE.md`, `docs/RUNBOOK.md`, `docs/adr/`, `avap_chunker.py`, `avap_config.json`, `ingestion/chunks.jsonl`, and `src/config.py`.
+- DOCS: Added `Knowledge Base Ingestion` section to `README.md` documenting both ingestion pipelines in full: Pipeline A (Chonkie — `elasticsearch_ingestion.py`) with flow diagram, CLI usage, and chunk field table; Pipeline B (AVAP Native — `avap_chunker.py` + `avap_ingestor.py`) with flow diagram, chunk type table, semantic tags reference, and ingestor env vars.
+- DOCS: Replaced minimal `Testing & Debugging` section with complete documentation of all three gRPC methods (`AskAgent`, `AskAgentStream`, `EvaluateRAG`) including expected responses, multi-turn example, and verdict thresholds.
+- DOCS: Added `HTTP Proxy` section documenting all 7 HTTP endpoints (4 OpenAI + 3 Ollama), streaming vs non-streaming routing, `session_id` extension, and proxy env vars table.
+- DOCS: Fixed `API Contract (Protobuf)` section — corrected `grpc_tools.protoc` paths and added reference to `docs/API_REFERENCE.md`.
+- DOCS: Fixed remaining stale reference to `scripts/generate_mbpp_avap.py` in Dataset Generation section.
+- DOCS: Added Documentation Index table to `README.md` linking all documentation files.
+- DOCS: Updated `CONTRIBUTING.md` — added Section 9 (Architecture Decision Records) and updated PR checklist and doc policy table.
+- ENV: Added missing variable documentation to `README.md`: `ELASTICSEARCH_USER`, `ELASTICSEARCH_PASSWORD`, `ELASTICSEARCH_API_KEY`, `ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL`.
+
+---
+
+## [1.5.0] - 2026-03-12
+
+### Added
+- IMPLEMENTED:
+ - `scripts/pipelines/flows/translate_mbpp.py`: pipeline to generate synthetic dataset from mbpp dataset.
+ - `scripts/tasks/prompts.py`: module containing prompts for pipelines.
+ - `scripts/tasks/chunk.py`: module containing functions related to chunk management.
+ - `synthetic_datasets`: folder containing generated synthetic datasets.
+ - `src/config.py`: environment variables configuration file.
+
+### Changed
+- REFACTORED: `scripts/pipelines/flows/elasticsearch_ingestion.py` now uses `docs/LRM` or `docs/samples` documents instead of pre chunked files.
+- RENAMED `docs/AVAP Language: Core Commands & Functional Specification` to `docs/avap_language_github_docs`.
+- REMOVED: `Makefile` file.
+- REMOVED: `scripts/start-tunnels.sh` script.
+- DEPENDENCIES: `requirements.txt` updated with new libraries required by the new modules.
+- MOVED `scripts/generate_mbap.py` into `scripts/flows/generate_mbap.py`.
+
+
+## [1.4.0] - 2026-03-10
+
+### Added
+- **Dataset Generation Suite**: Added `scripts/generate_mbpp_avap.py` to automate the creation of synthetic AVAP training data.
+- **MBPP-style Benchmarking**: Support for generating structured JSON datasets with code solutions and Python-based validation tests (`test_list`).
+- **LRM Integration**: The generator now performs grounded synthesis using the `avap.md` Language Reference Manual.
+- **Anthropic Claude 3.5 Sonnet Integration**: Orchestration logic for high-fidelity code generation via API.
+
+### Changed
+- **README.md**: Added comprehensive documentation for the Evaluation & Dataset Generation pipeline.
+- **Project Structure**: Integrated `evaluation/` directory for synthetic dataset storage.
+
+### Security
+- Added explicit policy to avoid committing real Anthropic API keys, enforcing the use of environment variables.
+
+## [1.3.0] - 2026-03-05
+
+### Added
+- IMPLEMENTED:
+ - `Docker/src/utils/emb_factory`: factory modules created for embedding model generation.
+ - `Docker/src/utils/llm_factory`: factory modules created for LLM generation.
+ - `Docker/src/graph.py`: workflow graph orchestration module added.
+ - `Docker/src/prompts.py`: centralized prompt definitions added.
+ - `Docker/src/state.py`: shared state management module added.
+ - `scripts/pipelines/flows/elasticsearch_ingestion.py`: pipeline to populate the elasticsearch vector database.
+ - `ingestion/docs`: folder containing all chunked AVAP documents.
+
+### Changed
+- REFACTORED: `server.py` updated to integrate the new graph/state/prompt and utils-based architecture.
+- REFACTORED: `docker-compose.yaml` now uses fully parameterized environment variables instead of hardcoded service URLs and credentials.
+- DEPENDENCIES: `requirements.txt` updated with new libraries required by the new modules.
+
+
+
+## [1.2.0] - 2026-03-03
+
+### Added
+- GOVERNANCE: Introduced `CONTRIBUTING.md` as the single source of truth for all contribution standards, covering GitFlow, infrastructure policy, repository standards, environment variables, changelog, documentation, and incident reporting.
+- GOVERNANCE: Added `.github/pull_request_template.md` enforcing a mandatory structured checklist on every PR — including explicit sign-off on environment variables, changelog, and documentation.
+- DOCS: Added Environment Variables reference table to `README.md`. All variables must be registered here. PRs introducing undocumented variables will be rejected.
+- DOCS: Updated project structure map in `README.md` to reflect new governance files.
+
+### Changed
+- PROCESS: Pull Requests that introduce new environment variables without documentation, omit required changelog entries, or skip required documentation updates are now formally non-mergeable per `CONTRIBUTING.md`.
+
+---
+
+## [1.1.0] - 2026-02-16
+
+### Added
+- IMPLEMENTED: Strict repository structure enforcement to separate development environment from production runtime.
+- SECURITY: Added `.dockerignore` to prevent leaking sensitive source files and local configurations into the container.
+
+### Changed
+- REFACTORED: Dockerfile build logic to optimize build context and reduce image footprint.
+- ARCHITECTURE: Moved application entry point to `/app` and eliminated the redundant root `/workspace` directory for enhanced security.
+
+### Fixed
+- RESOLVED: Issue where non-production files were being bundled into the Docker image, improving deployment speed and container isolation.
+
+---
+
## [1.0.0] - 2026-02-09
### Added
-
- **System Architecture:** Implementation of the triple-layer stack (Engine, Vector DB, Observability).
- **Core Engine:** Deployment of the `brunix-assistance-engine` using **Python 3.11**, **LangChain**, and **LangGraph** for agentic workflows.
- **Communication Layer:** Established **gRPC** as the primary high-performance interface (Port 50051/50052).
- **Knowledge Base:** Integration of **Elasticsearch 8.12** (`brunix-vector-db`) for AVAP technology RAG support.
- **Observability Framework:** Deployment of **Langfuse** and **PostgreSQL** for full trace audit and cost management.
- **Security:** Initial network isolation within Docker (`avap-network`) and production-ready secret management design.
-
diff --git a/docker-compose.yaml b/docker-compose.yaml
deleted file mode 100644
index 38112d0..0000000
--- a/docker-compose.yaml
+++ /dev/null
@@ -1,63 +0,0 @@
-version: '3.8'
-
-services:
- brunix-engine:
- build: .
- container_name: brunix-assistance-engine
- ports:
- - "50052:50051"
- environment:
- - ELASTICSEARCH_URL=http://elasticsearch:9200
- - LANGFUSE_PUBLIC_KEY=${LANGFUSE_PUBLIC_KEY}
- - LANGFUSE_SECRET_KEY=${LANGFUSE_SECRET_KEY}
- - LANGFUSE_HOST=http://langfuse:3000
- - OPENAI_API_KEY=${OPENAI_API_KEY} # O el proveedor que elija Ivar
- depends_on:
- - elasticsearch
- - langfuse
- networks:
- - avap-network
-
- elasticsearch:
- image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
- container_name: brunix-vector-db
- environment:
- - discovery.type=single-node
- - xpack.security.enabled=false
- - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- ports:
- - "9200:9200"
- networks:
- - avap-network
-
- langfuse:
- image: langfuse/langfuse:2.33.0
- container_name: brunix-observability
- ports:
- - "3000:3000"
- environment:
- - DATABASE_URL=postgresql://postgres:brunix_pass@langfuse-db:5432/postgres
- - NEXTAUTH_URL=http://localhost:3000
- - NEXTAUTH_SECRET=my_ultra_secret
- - SALT=my_salt
- depends_on:
- - langfuse-db
- networks:
- - avap-network
-
- langfuse-db:
- image: postgres:15
- container_name: brunix-postgres
- environment:
- - POSTGRES_PASSWORD=brunix_pass
- volumes:
- - postgres_data:/var/lib/postgresql/data
- networks:
- - avap-network
-
-networks:
- avap-network:
- driver: bridge
-
-volumes:
- postgres_data:
diff --git a/docs/ADR/ADR-0001-grpc-primary-interface.md b/docs/ADR/ADR-0001-grpc-primary-interface.md
new file mode 100644
index 0000000..6ffe5fa
--- /dev/null
+++ b/docs/ADR/ADR-0001-grpc-primary-interface.md
@@ -0,0 +1,54 @@
+# ADR-0001: gRPC as the Primary Communication Interface
+
+**Date:** 2026-02-09
+**Status:** Accepted
+**Deciders:** Rafael Ruiz (CTO, AVAP Technology), MrHouston Engineering
+
+---
+
+## Context
+
+The Brunix Assistance Engine needs a communication protocol to serve AI completions from internal backend services and client applications. The primary requirement is **real-time token streaming** — the engine must forward Ollama's token output to clients with minimal latency, not buffer the full response.
+
+Secondary requirements:
+- Strict API contract enforcement (no schema drift)
+- High throughput for potential multi-client scenarios
+- Easy introspection and testing in development
+
+Candidates evaluated: REST/HTTP+JSON, gRPC, WebSockets, GraphQL subscriptions.
+
+---
+
+## Decision
+
+Use **gRPC with Protocol Buffers (proto3)** as the primary interface, exposed on port `50051` (container) / `50052` (host).
+
+The API contract is defined in a single source of truth: `Docker/protos/brunix.proto`.
+
+An **OpenAI-compatible HTTP proxy** (`openai_proxy.py`, port `8000`) is provided as a secondary interface to enable integration with standard tooling (continue.dev, LiteLLM, etc.) without modifying the core engine.
+
+---
+
+## Rationale
+
+| Criterion | REST+JSON | **gRPC** | WebSockets |
+|---|---|---|---|
+| Streaming support | Requires SSE or chunked | ✅ Native server-side streaming | ✅ Bidirectional |
+| Schema enforcement | ❌ Optional (OpenAPI) | ✅ Enforced by protobuf | ❌ None |
+| Code generation | Manual or OpenAPI tooling | ✅ Automatic stub generation | Manual |
+| Performance | Good | ✅ Better (binary framing) | Good |
+| Dev tooling | Excellent | Good (`grpcurl`, reflection) | Limited |
+| Browser-native | ✅ Yes | ❌ Requires grpc-web proxy | ✅ Yes |
+
+gRPC was chosen because: (1) streaming is a first-class citizen, not bolted on; (2) the proto contract makes API evolution explicit and breaking changes detectable at compile time; (3) stub generation eliminates a class of integration bugs.
+
+The lack of browser-native support is not a concern — all current clients are server-side services or CLI tools.
+
+---
+
+## Consequences
+
+- All API changes require modifying `brunix.proto` and regenerating stubs (`grpc_tools.protoc`).
+- Client libraries must use the generated stubs or `grpcurl` — no curl-based ad-hoc testing of the main API.
+- The OpenAI proxy adds a second entry point that must be kept in sync with the gRPC interface behavior.
+- gRPC reflection is enabled in development. It should be evaluated for disabling in production to reduce the attack surface.
diff --git a/docs/ADR/ADR-0002-two-phase-streaming.md b/docs/ADR/ADR-0002-two-phase-streaming.md
new file mode 100644
index 0000000..39f7c39
--- /dev/null
+++ b/docs/ADR/ADR-0002-two-phase-streaming.md
@@ -0,0 +1,61 @@
+# ADR-0002: Two-Phase Streaming Design for `AskAgentStream`
+
+**Date:** 2026-03-05
+**Status:** Accepted
+**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
+
+---
+
+## Context
+
+The initial `AskAgent` implementation calls `graph.invoke()` — LangGraph's synchronous execution — and returns the complete answer as a single gRPC message. This blocks the gRPC connection for the full generation time (typically 3–15 seconds) with no intermediate feedback to the client.
+
+A streaming variant is required that forwards Ollama's token output to the client as tokens are produced, enabling real-time rendering in client UIs.
+
+The straightforward approach would be to use LangGraph's own `graph.stream()` method.
+
+---
+
+## Decision
+
+Implement `AskAgentStream` using a **two-phase design**:
+
+**Phase 1 — Graph-managed preparation:**
+Run `build_prepare_graph()` (classify → reformulate → retrieve) via `prepare_graph.invoke()`. This phase runs synchronously and produces the full classified, reformulated query and retrieved context. It does **not** call the LLM for generation.
+
+**Phase 2 — Manual LLM streaming:**
+Call `build_final_messages()` to reconstruct the exact prompt that the full graph would have used, then call `llm.stream(final_messages)` directly. Each token chunk is yielded immediately as an `AgentResponse`.
+
+A separate `build_prepare_graph()` function mirrors the routing logic of `build_graph()` but terminates at `END` before any generation node. A `build_final_messages()` function replicates the prompt-building logic of `generate`, `generate_code`, and `respond_conversational`.
+
+---
+
+## Rationale
+
+### Why not use `graph.stream()`?
+
+LangGraph's `stream()` yields **state snapshots** at node boundaries, not LLM tokens. When using `llm.invoke()` inside a graph node, the invocation is atomic — there are no intermediate yields. To get per-token streaming from `llm.stream()`, the call must happen outside the graph.
+
+### Why not inline the streaming call inside a graph node?
+
+Yielding from inside a LangGraph node to an outer generator is architecturally complex and not idiomatic to LangGraph. It requires either a callback mechanism or breaking the node abstraction.
+
+### Trade-offs
+
+| Concern | Two-phase design | Alternative (streaming inside graph) |
+|---|---|---|
+| Code duplication | Medium — routing logic exists in both graphs | Low |
+| Architectural clarity | High — phases are clearly separated | Low |
+| LangGraph compatibility | High — standard usage | Low — requires framework internals |
+| Maintainability | Requires keeping `build_prepare_graph` and `build_final_messages` in sync with `build_graph` | Single source of routing truth |
+
+The duplication risk is accepted because: (1) the routing logic is simple (3 branches), (2) the prepare graph is strictly a subset of the full graph, and (3) both are tested via the same integration test queries.
+
+---
+
+## Consequences
+
+- `graph.py` now exports three functions: `build_graph`, `build_prepare_graph`, `build_final_messages`.
+- Any change to query routing logic in `build_graph` must be mirrored in `build_prepare_graph`.
+- Any change to prompt selection in `generate` / `generate_code` / `respond_conversational` must be mirrored in `build_final_messages`.
+- Session history persistence happens **after the stream ends**, not mid-stream. A client that disconnects early will cause history to not be saved for that turn.
diff --git a/docs/ADR/ADR-0003-hybrid-retrieval-rrf.md b/docs/ADR/ADR-0003-hybrid-retrieval-rrf.md
new file mode 100644
index 0000000..f9cfad9
--- /dev/null
+++ b/docs/ADR/ADR-0003-hybrid-retrieval-rrf.md
@@ -0,0 +1,63 @@
+# ADR-0003: Hybrid Retrieval (BM25 + kNN) with RRF Fusion
+
+**Date:** 2026-03-05
+**Status:** Accepted
+**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
+
+---
+
+## Context
+
+The RAG pipeline needs a retrieval strategy for finding relevant AVAP documentation chunks from Elasticsearch. The knowledge base contains a mix of:
+
+- **Prose documentation** (explanations of AVAP concepts, commands, parameters) — benefits from semantic (dense) retrieval
+- **Code examples and BNF grammar** (exact syntax patterns, function signatures) — benefits from lexical (sparse) retrieval, where exact token matches are critical
+
+A single retrieval strategy will underperform for one of these document types.
+
+---
+
+## Decision
+
+Implement **hybrid retrieval** combining:
+- **BM25** (Elasticsearch `multi_match` on `content^2` and `text^2` fields) for lexical relevance
+- **kNN** (Elasticsearch `knn` on the `embedding` field) for semantic relevance
+- **RRF (Reciprocal Rank Fusion)** with constant `k=60` to fuse rankings from both systems
+
+The fused top-8 documents are passed to the generation node as context.
+
+Query reformulation (`reformulate` node) runs before retrieval and rewrites the user query into keyword-optimized form to improve BM25 recall for AVAP-specific terminology.
+
+---
+
+## Rationale
+
+### Why hybrid over pure semantic?
+
+AVAP is a domain-specific language with precise, non-negotiable syntax. For queries like "how does `addVar` work", exact lexical matching on the function name `addVar` is more reliable than semantic similarity, which may confuse similar-sounding functions or return contextually related but syntactically different commands.
+
+### Why hybrid over pure BM25?
+
+Conversational queries ("explain how loops work in AVAP", "what's the difference between addVar and setVar") benefit from semantic search that captures meaning beyond exact keyword overlap.
+
+### Why RRF over score normalization?
+
+BM25 and kNN scores are on different scales and distributions. Normalizing them requires careful calibration per index. RRF operates on ranks — not scores — making it robust to distribution differences and requiring no per-deployment tuning. The `k=60` constant is the standard literature value.
+
+### Retrieval parameters
+
+| Parameter | Value | Rationale |
+|---|---|---|
+| `k` (top documents) | 8 | Balances context richness vs. context window length |
+| `num_candidates` (kNN) | `k × 5 = 40` | Standard ES kNN oversampling ratio |
+| BM25 fields | `content^2, text^2` | Boost content/text fields; `^2` emphasizes them over metadata |
+| Fuzziness (BM25) | `AUTO` | Handles minor typos in AVAP function names |
+
+---
+
+## Consequences
+
+- Retrieval requires two ES queries per request (BM25 + kNN). This is acceptable given the tunnel latency baseline already incurred.
+- If either BM25 or kNN fails (e.g., embedding model unavailable), the system degrades gracefully: the failing component logs a warning and returns an empty list; RRF fusion proceeds with the available rankings.
+- Context length grows with `k`. At `k=8` with typical chunk sizes (~300 tokens each), context is ~2400 tokens — within the `qwen2.5:1.5b` context window.
+- Changing `k` has a direct impact on both retrieval quality and generation latency. Any change must be evaluated with `EvaluateRAG` before merging.
diff --git a/docs/ADR/ADR-0004-claude-eval-judge.md b/docs/ADR/ADR-0004-claude-eval-judge.md
new file mode 100644
index 0000000..b92a867
--- /dev/null
+++ b/docs/ADR/ADR-0004-claude-eval-judge.md
@@ -0,0 +1,54 @@
+# ADR-0004: Claude as the RAGAS Evaluation Judge
+
+**Date:** 2026-03-10
+**Status:** Accepted
+**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
+
+---
+
+## Context
+
+The `EvaluateRAG` endpoint runs RAGAS metrics to measure the quality of the RAG pipeline. RAGAS metrics (`faithfulness`, `answer_relevancy`, `context_recall`, `context_precision`) require an LLM judge to score answers against ground truth and context.
+
+The production LLM is Ollama `qwen2.5:1.5b` — a small, locally-hosted model optimized for AVAP code generation speed. Using it as the evaluation judge creates a conflict of interest (measuring a system with the same model that produces it) and a quality concern (small models produce unreliable evaluation scores).
+
+---
+
+## Decision
+
+Use **Claude (`claude-sonnet-4-20250514`) as the RAGAS evaluation judge**, accessed via the Anthropic API.
+
+The production Ollama LLM is still used for **answer generation** during evaluation (to measure real-world pipeline quality). Only the scoring step uses Claude.
+
+This requires `ANTHROPIC_API_KEY` to be set. The `EvaluateRAG` endpoint fails with an explicit error if the key is missing.
+
+---
+
+## Rationale
+
+### Separation of generation and evaluation
+
+Using a different model for generation and evaluation is standard practice in LLM system evaluation. The evaluation judge must be:
+1. **Independent** — not the same model being measured
+2. **High-capability** — capable of nuanced faithfulness and relevancy judgements
+3. **Deterministic** — consistent scores across runs (achieved via `temperature=0`)
+
+### Why Claude specifically?
+
+- Claude Sonnet-class models score among the highest on LLM-as-judge benchmarks for English and multilingual evaluation tasks
+- The AVAP knowledge base contains bilingual content (Spanish + English); Claude handles both reliably
+- The Anthropic SDK is already available in the dependency stack (`langchain-anthropic`)
+
+### Cost implications
+
+Claude is called only during explicit `EvaluateRAG` invocations, not during production queries. Cost per evaluation run depends on dataset size. For 50 questions at standard RAGAS prompt lengths, estimated cost is < $0.50 using Sonnet pricing.
+
+---
+
+## Consequences
+
+- `ANTHROPIC_API_KEY` and `ANTHROPIC_MODEL` become required configuration for the evaluation feature.
+- Evaluation runs incur external API costs. This should be factored into the evaluation cadence policy.
+- The `judge_model` field in `EvalResponse` records which Claude version was used, enabling score comparisons across model versions over time.
+- If Anthropic's API is unreachable or rate-limited, `EvaluateRAG` will fail. This is acceptable since evaluation is a batch operation, not a real-time user-facing feature.
+- Any change to `ANTHROPIC_MODEL` may alter scoring distributions. Historical eval scores are only comparable when the same judge model was used.
diff --git a/docs/API_REFERENCE.md b/docs/API_REFERENCE.md
new file mode 100644
index 0000000..d6d3675
--- /dev/null
+++ b/docs/API_REFERENCE.md
@@ -0,0 +1,339 @@
+# Brunix Assistance Engine — API Reference
+
+> **Protocol:** gRPC (proto3)
+> **Port:** `50052` (host) → `50051` (container)
+> **Reflection:** Enabled — service introspection available via `grpcurl`
+> **Source of truth:** `Docker/protos/brunix.proto`
+
+---
+
+## Table of Contents
+
+1. [Service Definition](#1-service-definition)
+2. [Methods](#2-methods)
+ - [AskAgent](#21-askagent)
+ - [AskAgentStream](#22-askagentstream)
+ - [EvaluateRAG](#23-evaluaterag)
+3. [Message Types](#3-message-types)
+4. [Error Handling](#4-error-handling)
+5. [Client Examples](#5-client-examples)
+6. [OpenAI-Compatible Proxy](#6-openai-compatible-proxy)
+
+---
+
+## 1. Service Definition
+
+```protobuf
+package brunix;
+
+service AssistanceEngine {
+ rpc AskAgent (AgentRequest) returns (stream AgentResponse);
+ rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
+ rpc EvaluateRAG (EvalRequest) returns (EvalResponse);
+}
+```
+
+Both `AskAgent` and `AskAgentStream` return a **server-side stream** of `AgentResponse` messages. They differ in how they produce and deliver the response — see [§2.1](#21-askagent) and [§2.2](#22-askagentstream).
+
+---
+
+## 2. Methods
+
+### 2.1 `AskAgent`
+
+**Behaviour:** Runs the full LangGraph pipeline (classify → reformulate → retrieve → generate) using `llm.invoke()`. Returns the complete answer as a **single** `AgentResponse` message with `is_final = true`.
+
+**Use case:** Clients that do not support streaming or need a single atomic response.
+
+**Request:**
+
+```protobuf
+message AgentRequest {
+ string query = 1; // The user's question. Required. Max recommended: 4096 chars.
+ string session_id = 2; // Conversation session identifier. Optional.
+ // If empty, defaults to "default" (shared session).
+ // Use a UUID per user/conversation for isolation.
+}
+```
+
+**Response stream:**
+
+| Message # | `text` | `avap_code` | `is_final` |
+|---|---|---|---|
+| 1 (only) | Full answer text | `"AVAP-2026"` | `true` |
+
+**Latency characteristics:** Depends on LLM generation time (non-streaming). Typically 3–15 seconds for `qwen2.5:1.5b` on the Devaron cluster.
+
+---
+
+### 2.2 `AskAgentStream`
+
+**Behaviour:** Runs `prepare_graph` (classify → reformulate → retrieve), then calls `llm.stream()` directly. Emits one `AgentResponse` per token from Ollama, followed by a terminal message.
+
+**Use case:** Interactive clients (chat UIs, terminal tools) that need progressive rendering.
+
+**Request:** Same `AgentRequest` as `AskAgent`.
+
+**Response stream:**
+
+| Message # | `text` | `avap_code` | `is_final` |
+|---|---|---|---|
+| 1…N | Single token | `""` | `false` |
+| N+1 (final) | `""` | `""` | `true` |
+
+**Client contract:**
+- Accumulate `text` from all messages where `is_final == false` to reconstruct the full answer.
+- The `is_final == true` message signals end-of-stream. Its `text` is always empty and should be discarded.
+- Do not close the stream early — the engine will fail to persist conversation history if the stream is interrupted.
+
+---
+
+### 2.3 `EvaluateRAG`
+
+**Behaviour:** Runs the RAGAS evaluation pipeline against the golden dataset. Uses the production Ollama LLM for answer generation and Claude as the evaluation judge.
+
+> **Requirement:** `ANTHROPIC_API_KEY` must be configured in the environment. This endpoint will return an error response if it is missing.
+
+**Request:**
+
+```protobuf
+message EvalRequest {
+ string category = 1; // Optional. Filter golden dataset by category name.
+ // If empty, all categories are evaluated.
+ int32 limit = 2; // Optional. Evaluate only the first N questions.
+ // If 0, all matching questions are evaluated.
+ string index = 3; // Optional. Elasticsearch index to evaluate against.
+ // If empty, uses the server's configured ELASTICSEARCH_INDEX.
+}
+```
+
+**Response (single, non-streaming):**
+
+```protobuf
+message EvalResponse {
+ string status = 1; // "ok" or error description
+ int32 questions_evaluated = 2; // Number of questions actually processed
+ float elapsed_seconds = 3; // Total wall-clock time
+ string judge_model = 4; // Claude model used as judge
+ string index = 5; // Elasticsearch index evaluated
+
+ // RAGAS metric scores (0.0 – 1.0)
+ float faithfulness = 6;
+ float answer_relevancy = 7;
+ float context_recall = 8;
+ float context_precision = 9;
+
+ float global_score = 10; // Mean of non-zero metric scores
+ string verdict = 11; // "EXCELLENT" | "ACCEPTABLE" | "INSUFFICIENT"
+
+ repeated QuestionDetail details = 12;
+}
+
+message QuestionDetail {
+ string id = 1; // Question ID from golden dataset
+ string category = 2; // Question category
+ string question = 3; // Question text
+ string answer_preview = 4; // First 300 chars of generated answer
+ int32 n_chunks = 5; // Number of context chunks retrieved
+}
+```
+
+**Verdict thresholds:**
+
+| Score | Verdict |
+|---|---|
+| ≥ 0.80 | `EXCELLENT` |
+| ≥ 0.60 | `ACCEPTABLE` |
+| < 0.60 | `INSUFFICIENT` |
+
+---
+
+## 3. Message Types
+
+### `AgentRequest`
+
+| Field | Type | Required | Description |
+|---|---|---|---|
+| `query` | `string` | Yes | User's natural language question |
+| `session_id` | `string` | No | Conversation identifier for multi-turn context. Use a stable UUID per user session. |
+
+### `AgentResponse`
+
+| Field | Type | Description |
+|---|---|---|
+| `text` | `string` | Token text (streaming) or full answer text (non-streaming) |
+| `avap_code` | `string` | Currently always `"AVAP-2026"` in non-streaming mode, empty in streaming |
+| `is_final` | `bool` | `true` only on the last message of the stream |
+
+### `EvalRequest`
+
+| Field | Type | Required | Default | Description |
+|---|---|---|---|---|
+| `category` | `string` | No | `""` (all) | Filter golden dataset by category |
+| `limit` | `int32` | No | `0` (all) | Max questions to evaluate |
+| `index` | `string` | No | `$ELASTICSEARCH_INDEX` | ES index to evaluate |
+
+### `EvalResponse`
+
+See full definition in [§2.3](#23-evaluaterag).
+
+---
+
+## 4. Error Handling
+
+The engine catches all exceptions and returns them as terminal `AgentResponse` messages rather than gRPC status errors. This means:
+
+- The stream will **not** be terminated with a non-OK gRPC status code on application-level errors.
+- Check for error strings in the `text` field that begin with `[ENG] Error:`.
+- The stream will still end with `is_final = true`.
+
+**Example error response:**
+```json
+{"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true}
+```
+
+**`EvaluateRAG` error response:**
+Returned as a single `EvalResponse` with `status` set to the error description:
+```json
+{"status": "ANTHROPIC_API_KEY no configurada en .env", ...}
+```
+
+---
+
+## 5. Client Examples
+
+### Introspect the service
+
+```bash
+grpcurl -plaintext localhost:50052 list
+# Output: brunix.AssistanceEngine
+
+grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
+```
+
+### `AskAgent` — full response
+
+```bash
+grpcurl -plaintext \
+ -d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
+ localhost:50052 \
+ brunix.AssistanceEngine/AskAgent
+```
+
+Expected response:
+```json
+{
+ "text": "addVar is an AVAP command that declares a new variable...",
+ "avap_code": "AVAP-2026",
+ "is_final": true
+}
+```
+
+### `AskAgentStream` — token streaming
+
+```bash
+grpcurl -plaintext \
+ -d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
+ localhost:50052 \
+ brunix.AssistanceEngine/AskAgentStream
+```
+
+Expected response (truncated):
+```json
+{"text": "Here", "is_final": false}
+{"text": " is", "is_final": false}
+{"text": " a", "is_final": false}
+...
+{"text": "", "is_final": true}
+```
+
+### `EvaluateRAG` — run evaluation
+
+```bash
+# Evaluate first 10 questions from the "core_syntax" category
+grpcurl -plaintext \
+ -d '{"category": "core_syntax", "limit": 10}' \
+ localhost:50052 \
+ brunix.AssistanceEngine/EvaluateRAG
+```
+
+Expected response:
+```json
+{
+ "status": "ok",
+ "questions_evaluated": 10,
+ "elapsed_seconds": 142.3,
+ "judge_model": "claude-sonnet-4-20250514",
+ "index": "avap-docs-test",
+ "faithfulness": 0.8421,
+ "answer_relevancy": 0.7913,
+ "context_recall": 0.7234,
+ "context_precision": 0.6891,
+ "global_score": 0.7615,
+ "verdict": "ACCEPTABLE",
+ "details": [...]
+}
+```
+
+### Multi-turn conversation example
+
+```bash
+# Turn 1
+grpcurl -plaintext \
+ -d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
+ localhost:50052 brunix.AssistanceEngine/AskAgentStream
+
+# Turn 2 — the engine has history from Turn 1
+grpcurl -plaintext \
+ -d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
+ localhost:50052 brunix.AssistanceEngine/AskAgentStream
+```
+
+### Regenerate gRPC stubs after modifying `brunix.proto`
+
+```bash
+python -m grpc_tools.protoc \
+ -I./Docker/protos \
+ --python_out=./Docker/src \
+ --grpc_python_out=./Docker/src \
+ ./Docker/protos/brunix.proto
+```
+
+---
+
+## 6. OpenAI-Compatible Proxy
+
+The container also exposes an HTTP server on port `8000` (`openai_proxy.py`) that wraps `AskAgentStream` under an OpenAI-compatible endpoint. This allows integration with any tool that supports the OpenAI Chat Completions API.
+
+**Base URL:** `http://localhost:8000`
+
+### `POST /v1/chat/completions`
+
+**Request body:**
+
+```json
+{
+ "model": "brunix",
+ "messages": [
+ {"role": "user", "content": "What is addVar in AVAP?"}
+ ],
+ "stream": true
+}
+```
+
+**Notes:**
+- The `model` field is ignored; the engine always uses the configured `OLLAMA_MODEL_NAME`.
+- Session management is handled internally by the proxy. Conversation continuity across separate HTTP requests is not guaranteed.
+- Only `stream: true` is fully supported. Non-streaming mode may be available but is not the primary use case.
+
+**Example with curl:**
+
+```bash
+curl http://localhost:8000/v1/chat/completions \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "brunix",
+ "messages": [{"role": "user", "content": "Explain AVAP loops"}],
+ "stream": true
+ }'
+```
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
new file mode 100644
index 0000000..4408927
--- /dev/null
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,463 @@
+# Brunix Assistance Engine — Architecture Reference
+
+> **Audience:** Engineers contributing to this repository, architects reviewing the system design, and operators responsible for its deployment.
+> **Last updated:** 2026-03-18
+> **Version:** 1.5.x
+
+---
+
+## Table of Contents
+
+1. [System Overview](#1-system-overview)
+2. [Component Inventory](#2-component-inventory)
+3. [Request Lifecycle](#3-request-lifecycle)
+4. [LangGraph Workflow](#4-langgraph-workflow)
+5. [RAG Pipeline — Hybrid Search](#5-rag-pipeline--hybrid-search)
+6. [Streaming Architecture (AskAgentStream)](#6-streaming-architecture-askagentstream)
+7. [Evaluation Pipeline (EvaluateRAG)](#7-evaluation-pipeline-evaluaterag)
+8. [Data Ingestion Pipeline](#8-data-ingestion-pipeline)
+9. [Infrastructure Layout](#9-infrastructure-layout)
+10. [Session State & Conversation Memory](#10-session-state--conversation-memory)
+11. [Observability Stack](#11-observability-stack)
+12. [Security Boundaries](#12-security-boundaries)
+13. [Known Limitations & Future Work](#13-known-limitations--future-work)
+
+---
+
+## 1. System Overview
+
+The **Brunix Assistance Engine** is a stateful, streaming-capable AI service that answers questions about the AVAP programming language. It combines:
+
+- **gRPC** as the primary communication interface (port `50051` inside container, `50052` on host)
+- **LangGraph** for deterministic, multi-step agentic orchestration
+- **Hybrid RAG** (BM25 + kNN with RRF fusion) over an Elasticsearch vector index
+- **Ollama** as the local LLM and embedding backend
+- **RAGAS + Claude** as the automated evaluation judge
+
+A secondary **OpenAI-compatible HTTP proxy** (port `8000`) is served via FastAPI/Uvicorn, enabling integration with tools that expect the OpenAI API format.
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ External Clients │
+│ grpcurl / App SDK │ OpenAI-compatible client │
+└────────────┬────────────────┴──────────────┬────────────────┘
+ │ gRPC :50052 │ HTTP :8000
+ ▼ ▼
+┌────────────────────────────────────────────────────────────┐
+│ Docker Container │
+│ │
+│ ┌─────────────────────┐ ┌──────────────────────────┐ │
+│ │ server.py (gRPC) │ │ openai_proxy.py (HTTP) │ │
+│ │ BrunixEngine │ │ FastAPI / Uvicorn │ │
+│ └──────────┬──────────┘ └──────────────────────────┘ │
+│ │ │
+│ ┌──────────▼──────────────────────────────────────────┐ │
+│ │ LangGraph Orchestration │ │
+│ │ classify → reformulate → retrieve → generate │ │
+│ └──────────────────────────┬───────────────────────────┘ │
+│ │ │
+│ ┌───────────────────┼────────────────────┐ │
+│ ▼ ▼ ▼ │
+│ Ollama (LLM) Ollama (Embed) Elasticsearch │
+│ via tunnel via tunnel via tunnel │
+└────────────────────────────────────────────────────────────┘
+ │ kubectl port-forward tunnels │
+ ▼ ▼
+ Devaron Cluster (Vultr Kubernetes)
+ ollama-light-service:11434 brunix-vector-db:9200
+ brunix-postgres:5432 Langfuse UI
+```
+
+---
+
+## 2. Component Inventory
+
+| Component | File / Service | Responsibility |
+|---|---|---|
+| **gRPC Server** | `Docker/src/server.py` | Entry point. Implements the `AssistanceEngine` servicer. Initializes LLM, embeddings, ES client, and both graphs. |
+| **Full Graph** | `Docker/src/graph.py` → `build_graph()` | Complete workflow: classify → reformulate → retrieve → generate. Used by `AskAgent` and `EvaluateRAG`. |
+| **Prepare Graph** | `Docker/src/graph.py` → `build_prepare_graph()` | Partial workflow: classify → reformulate → retrieve. Does **not** call the LLM for generation. Used by `AskAgentStream` to enable manual token streaming. |
+| **Message Builder** | `Docker/src/graph.py` → `build_final_messages()` | Reconstructs the final prompt list from prepared state for `llm.stream()`. |
+| **Prompt Library** | `Docker/src/prompts.py` | Centralized definitions for `CLASSIFY`, `REFORMULATE`, `GENERATE`, `CODE_GENERATION`, and `CONVERSATIONAL` prompts. |
+| **Agent State** | `Docker/src/state.py` | `AgentState` TypedDict shared across all graph nodes. |
+| **Evaluation Suite** | `Docker/src/evaluate.py` | RAGAS-based pipeline. Uses the production retriever + Ollama LLM for generation, and Claude as the impartial judge. |
+| **OpenAI Proxy** | `Docker/src/openai_proxy.py` | FastAPI application that wraps `AskAgentStream` under an `/v1/chat/completions` endpoint. |
+| **LLM Factory** | `Docker/src/utils/llm_factory.py` | Provider-agnostic factory for chat models (Ollama, AWS Bedrock). |
+| **Embedding Factory** | `Docker/src/utils/emb_factory.py` | Provider-agnostic factory for embedding models (Ollama, HuggingFace). |
+| **Ingestion Pipeline** | `scripts/pipelines/flows/elasticsearch_ingestion.py` | Chunks and ingests AVAP documents into Elasticsearch with embeddings. |
+| **Dataset Generator** | `scripts/pipelines/flows/generate_mbap.py` | Generates synthetic MBPP-style AVAP problems using Claude. |
+| **MBPP Translator** | `scripts/pipelines/flows/translate_mbpp.py` | Translates MBPP Python dataset into AVAP equivalents. |
+
+---
+
+## 3. Request Lifecycle
+
+### 3.1 `AskAgent` (non-streaming)
+
+```
+Client → gRPC AgentRequest{query, session_id}
+ │
+ ├─ Load conversation history from session_store[session_id]
+ ├─ Build initial_state = {messages: history + [user_msg], ...}
+ │
+ └─ graph.invoke(initial_state)
+ ├─ classify → query_type ∈ {RETRIEVAL, CODE_GENERATION, CONVERSATIONAL}
+ ├─ reformulate → reformulated_query (keyword-optimized for semantic search)
+ ├─ retrieve → context (top-8 hybrid RRF chunks from Elasticsearch)
+ └─ generate → final AIMessage (llm.invoke)
+ │
+ ├─ Persist updated history to session_store[session_id]
+ └─ yield AgentResponse{text, avap_code="AVAP-2026", is_final=True}
+```
+
+### 3.2 `AskAgentStream` (token streaming)
+
+```
+Client → gRPC AgentRequest{query, session_id}
+ │
+ ├─ Load history from session_store[session_id]
+ ├─ Build initial_state
+ │
+ ├─ prepare_graph.invoke(initial_state) ← Phase 1: no LLM generation
+ │ ├─ classify
+ │ ├─ reformulate
+ │ └─ retrieve (or skip_retrieve if CONVERSATIONAL)
+ │
+ ├─ build_final_messages(prepared_state) ← Reconstruct prompt list
+ │
+ └─ for chunk in llm.stream(final_messages):
+ └─ yield AgentResponse{text=token, is_final=False}
+ │
+ ├─ Persist full assembled response to session_store
+ └─ yield AgentResponse{text="", is_final=True}
+```
+
+### 3.3 `EvaluateRAG`
+
+```
+Client → gRPC EvalRequest{category?, limit?, index?}
+ │
+ └─ evaluate.run_evaluation(...)
+ ├─ Load golden_dataset.json
+ ├─ Filter by category / limit
+ ├─ For each question:
+ │ ├─ retrieve_context (hybrid BM25+kNN, same as production)
+ │ └─ generate_answer (Ollama LLM + GENERATE_PROMPT)
+ ├─ Build RAGAS Dataset
+ ├─ Run RAGAS metrics with Claude as judge:
+ │ faithfulness / answer_relevancy / context_recall / context_precision
+ └─ Compute global_score + verdict (EXCELLENT / ACCEPTABLE / INSUFFICIENT)
+ │
+ └─ return EvalResponse{scores, global_score, verdict, details[]}
+```
+
+---
+
+## 4. LangGraph Workflow
+
+### 4.1 Full Graph (`build_graph`)
+
+```
+ ┌─────────────┐
+ │ classify │
+ └──────┬──────┘
+ │
+ ┌────────────────┼──────────────────┐
+ ▼ ▼ ▼
+ RETRIEVAL CODE_GENERATION CONVERSATIONAL
+ │ │ │
+ └────────┬───────┘ │
+ ▼ ▼
+ ┌──────────────┐ ┌────────────────────────┐
+ │ reformulate │ │ respond_conversational │
+ └──────┬───────┘ └───────────┬────────────┘
+ ▼ │
+ ┌──────────────┐ │
+ │ retrieve │ │
+ └──────┬───────┘ │
+ │ │
+ ┌────────┴───────────┐ │
+ ▼ ▼ │
+ ┌──────────┐ ┌───────────────┐ │
+ │ generate │ │ generate_code │ │
+ └────┬─────┘ └───────┬───────┘ │
+ │ │ │
+ └────────────────────┴────────────────┘
+ │
+ END
+```
+
+### 4.2 Prepare Graph (`build_prepare_graph`)
+
+Identical routing for classify, but generation nodes are replaced by `END`. The `CONVERSATIONAL` branch uses `skip_retrieve` (returns empty context without querying Elasticsearch).
+
+### 4.3 Query Type Routing
+
+| `query_type` | Triggers retrieve? | Generation prompt |
+|---|---|---|
+| `RETRIEVAL` | Yes | `GENERATE_PROMPT` (explanation-focused) |
+| `CODE_GENERATION` | Yes | `CODE_GENERATION_PROMPT` (code-focused, returns AVAP blocks) |
+| `CONVERSATIONAL` | No | `CONVERSATIONAL_PROMPT` (reformulation of prior answer) |
+
+---
+
+## 5. RAG Pipeline — Hybrid Search
+
+The retrieval system (`hybrid_search_native`) fuses BM25 lexical search and kNN dense vector search using **Reciprocal Rank Fusion (RRF)**.
+
+```
+User query
+ │
+ ├─ embeddings.embed_query(query) → query_vector [768-dim]
+ │
+ ├─ ES multi_match (BM25) on fields [content^2, text^2]
+ │ └─ top-k BM25 hits
+ │
+ └─ ES knn on field [embedding], num_candidates = k×5
+ └─ top-k kNN hits
+ │
+ ├─ RRF fusion: score(doc) = Σ 1/(rank + 60)
+ │
+ └─ Top-8 documents → format_context() → context string
+```
+
+**RRF constant:** `60` (standard value; prevents high-rank documents from dominating while still rewarding consensus between both retrieval modes).
+
+**Chunk metadata** attached to each retrieved document:
+
+| Field | Description |
+|---|---|
+| `chunk_id` | Unique identifier within the index |
+| `source_file` | Origin document filename |
+| `doc_type` | `prose`, `code`, `code_example`, `bnf` |
+| `block_type` | AVAP block type: `function`, `if`, `startLoop`, `try` |
+| `section` | Document section/chapter heading |
+
+Documents of type `code`, `code_example`, `bnf`, or block type `function / if / startLoop / try` are tagged as `[AVAP CODE]` in the formatted context, signaling the LLM to treat them as executable syntax rather than prose.
+
+---
+
+## 6. Streaming Architecture (AskAgentStream)
+
+The two-phase streaming design is critical to understand:
+
+**Why not stream through LangGraph?**
+LangGraph's `stream()` method yields full state snapshots per node, not individual tokens. To achieve true per-token streaming to the gRPC client, the generation step is deliberately extracted from the graph and called directly via `llm.stream()`.
+
+**Phase 1 — Deterministic preparation (graph-managed):**
+- Classification, query reformulation, and retrieval run through `prepare_graph.invoke()`.
+- This phase runs synchronously and produces the complete context before any token is emitted to the client.
+
+**Phase 2 — Token streaming (manual):**
+- `build_final_messages()` reconstructs the exact prompt that `generate` / `generate_code` / `respond_conversational` would have used.
+- `llm.stream(final_messages)` yields one `AIMessageChunk` per token from Ollama.
+- Each token is immediately forwarded to the gRPC client as `AgentResponse{text=token, is_final=False}`.
+- After the stream ends, the full assembled text is persisted to `session_store`.
+
+**Backpressure:** gRPC streaming is flow-controlled by the client. If the client stops reading, the Ollama token stream will block at the `yield` point. No explicit buffer overflow protection is implemented (acceptable for the current single-client dev mode).
+
+---
+
+## 7. Evaluation Pipeline (EvaluateRAG)
+
+The evaluation suite implements an **offline RAG evaluation** pattern using RAGAS metrics.
+
+### Judge model separation
+
+The production LLM (Ollama `qwen2.5:1.5b`) is used for **answer generation** — the same pipeline as production to measure real-world quality. Claude (`claude-sonnet-4-20250514`) is used as the **evaluation judge** — an independent, high-capability model that scores the generated answers against ground truth.
+
+### RAGAS metrics
+
+| Metric | Measures | Input |
+|---|---|---|
+| `faithfulness` | Are claims in the answer supported by the retrieved context? | answer + contexts |
+| `answer_relevancy` | Is the answer relevant to the question? | answer + question |
+| `context_recall` | Does the retrieved context cover the ground truth? | contexts + ground_truth |
+| `context_precision` | Are the retrieved chunks useful (signal-to-noise)? | contexts + ground_truth |
+
+### Global score & verdict
+
+```
+global_score = mean(non-zero metric scores)
+
+verdict:
+ ≥ 0.80 → EXCELLENT
+ ≥ 0.60 → ACCEPTABLE
+ < 0.60 → INSUFFICIENT
+```
+
+### Golden dataset
+
+Located at `Docker/src/golden_dataset.json`. Each entry follows this schema:
+
+```json
+{
+ "id": "avap-001",
+ "category": "core_syntax",
+ "question": "How do you declare a variable in AVAP?",
+ "ground_truth": "Use addVar to declare a variable..."
+}
+```
+
+---
+
+## 8. Data Ingestion Pipeline
+
+Documents flow into the Elasticsearch index through two paths:
+
+### Path A — AVAP documentation (structured markdown)
+
+```
+docs/LRM/avap.md
+docs/avap_language_github_docs/*.md
+docs/developer.avapframework.com/*.md
+ │
+ ▼
+scripts/pipelines/flows/elasticsearch_ingestion.py
+ │
+ ├─ Load markdown files
+ ├─ Chunk using scripts/pipelines/tasks/chunk.py
+ │ (semantic chunking via Chonkie library)
+ ├─ Generate embeddings via scripts/pipelines/tasks/embeddings.py
+ │ (Ollama or HuggingFace embedding model)
+ └─ Bulk index into Elasticsearch
+ index: avap-docs-* (configurable via ELASTICSEARCH_INDEX)
+ mapping: {content, embedding, source_file, doc_type, section, ...}
+```
+
+### Path B — Synthetic AVAP code samples
+
+```
+docs/samples/*.avap
+ │
+ ▼
+scripts/pipelines/flows/generate_mbap.py
+ │
+ ├─ Read AVAP LRM (docs/LRM/avap.md)
+ ├─ Call Claude API to generate MBPP-style problems
+ └─ Output synthetic_datasets/mbpp_avap.json
+ (used for fine-tuning and few-shot examples)
+```
+
+---
+
+## 9. Infrastructure Layout
+
+### Devaron Cluster (Vultr Kubernetes)
+
+| Service | K8s Name | Port | Purpose |
+|---|---|---|---|
+| LLM inference | `ollama-light-service` | `11434` | Text generation + embeddings |
+| Vector database | `brunix-vector-db` | `9200` | Elasticsearch 8.x |
+| Observability DB | `brunix-postgres` | `5432` | PostgreSQL for Langfuse |
+| Langfuse UI | — | `80` | `http://45.77.119.180` |
+
+### Kubernetes tunnel commands
+
+```bash
+# Terminal 1 — LLM
+kubectl port-forward --address 0.0.0.0 svc/ollama-light-service 11434:11434 \
+ -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
+
+# Terminal 2 — Elasticsearch
+kubectl port-forward --address 0.0.0.0 svc/brunix-vector-db 9200:9200 \
+ -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
+
+# Terminal 3 — PostgreSQL (Langfuse)
+kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 \
+ -n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
+```
+
+### Port map summary
+
+| Port | Protocol | Service | Scope |
+|---|---|---|---|
+| `50051` | gRPC | Brunix Engine (inside container) | Internal |
+| `50052` | gRPC | Brunix Engine (host-mapped) | External |
+| `8000` | HTTP | OpenAI proxy | External |
+| `11434` | HTTP | Ollama (via tunnel) | Tunnel |
+| `9200` | HTTP | Elasticsearch (via tunnel) | Tunnel |
+| `5432` | TCP | PostgreSQL/Langfuse (via tunnel) | Tunnel |
+
+---
+
+## 10. Session State & Conversation Memory
+
+Conversation history is managed via an in-process dictionary:
+
+```python
+session_store: dict[str, list] = defaultdict(list)
+# key: session_id (string, provided by client)
+# value: list of LangChain BaseMessage objects
+```
+
+**Characteristics:**
+- **In-memory only.** History is lost on container restart.
+- **No TTL or eviction.** Sessions grow unbounded for the lifetime of the process.
+- **Thread safety:** Python's GIL provides basic safety for the `ThreadPoolExecutor(max_workers=10)` gRPC server, but concurrent writes to the same `session_id` from two simultaneous requests are not explicitly protected.
+- **History window:** `format_history_for_classify()` uses only the last 6 messages for query classification to keep the classify prompt short and deterministic.
+
+> **Future work:** Replace `session_store` with a Redis-backed persistent store to survive restarts and support horizontal scaling.
+
+---
+
+## 11. Observability Stack
+
+### Langfuse tracing
+
+The server integrates Langfuse for end-to-end LLM tracing. Every `AskAgent` / `AskAgentStream` request creates a trace that captures:
+- Input query and session ID
+- Each LangGraph node execution (classify, reformulate, retrieve, generate)
+- LLM token counts, latency, and cost
+- Final response
+
+**Access:** `http://45.77.119.180` — requires a project API key configured via `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY`.
+
+### Logging
+
+Structured logging via Python's `logging` module, configured at `INFO` level. Log format:
+
+```
+[MODULE] context_info — key=value key=value
+```
+
+Key log markers:
+
+| Marker | Module | Meaning |
+|---|---|---|
+| `[ESEARCH]` | `server.py` | Elasticsearch connection status |
+| `[classify]` | `graph.py` | Query type decision + raw LLM output |
+| `[reformulate]` | `graph.py` | Reformulated query string |
+| `[hybrid]` | `graph.py` | BM25 / kNN hit counts and RRF result count |
+| `[retrieve]` | `graph.py` | Number of docs retrieved and context length |
+| `[generate]` | `graph.py` | Response character count |
+| `[AskAgentStream]` | `server.py` | Token count and total chars per stream |
+| `[eval]` | `evaluate.py` | Per-question retrieval and generation status |
+
+---
+
+## 12. Security Boundaries
+
+| Boundary | Current state | Risk |
+|---|---|---|
+| gRPC transport | **Insecure** (`add_insecure_port`) | Network interception possible. Acceptable in dev/tunnel setup; requires mTLS for production. |
+| Elasticsearch auth | Optional (user/pass or API key via env vars) | Index is accessible without auth if `ELASTICSEARCH_USER` and `ELASTICSEARCH_API_KEY` are unset. |
+| Container user | Non-root (`python:3.11-slim` default) | Low risk. Do not override with `root`. |
+| Secrets in env | Via `.env` / `docker-compose` env injection | Never commit real values. See [CONTRIBUTING.md](../CONTRIBUTING.md#6-environment-variables-policy). |
+| Session store | In-memory, no auth | Any caller with access to the gRPC port can read/write any session by guessing its ID. |
+| Kubeconfig | `./kubernetes/kubeconfig.yaml` (local only) | Grants cluster access. Never commit. Listed in `.gitignore`. |
+
+---
+
+## 13. Known Limitations & Future Work
+
+| Area | Limitation | Proposed solution |
+|---|---|---|
+| Session persistence | In-memory, lost on restart | Redis-backed `session_store` |
+| Horizontal scaling | `session_store` is per-process | Sticky sessions or external session store |
+| gRPC security | Insecure port | Add TLS + optional mTLS |
+| Elasticsearch auth | Not enforced if vars unset | Make auth required; fail-fast on startup |
+| Context window | Full history passed to generate; no truncation | Sliding window or summarization for long sessions |
+| Evaluation | Golden dataset must be manually maintained | Automated golden dataset refresh pipeline |
+| Rate limiting | None on gRPC server | Add interceptor-based rate limiter |
+| Health check | No gRPC health protocol | Implement `grpc.health.v1` |
diff --git a/docs/AVAP_CHUNKER_CONFIG.md b/docs/AVAP_CHUNKER_CONFIG.md
new file mode 100644
index 0000000..011e0ec
--- /dev/null
+++ b/docs/AVAP_CHUNKER_CONFIG.md
@@ -0,0 +1,372 @@
+# AVAP Chunker — Language Configuration Reference
+
+> **File:** `scripts/pipelines/ingestion/avap_config.json`
+> **Used by:** `avap_chunker.py` (Pipeline B)
+> **Last updated:** 2026-03-18
+
+This file is the **grammar definition** for the AVAP language chunker. It tells `avap_chunker.py` how to tokenize, parse, and semantically classify `.avap` source files before they are embedded and ingested into Elasticsearch. Modifying this file changes what the chunker recognises as a block, a statement, or a semantic feature — and therefore what metadata every chunk in the knowledge base carries.
+
+---
+
+## Table of Contents
+
+1. [Top-Level Fields](#1-top-level-fields)
+2. [Lexer](#2-lexer)
+3. [Blocks](#3-blocks)
+4. [Statements](#4-statements)
+5. [Semantic Tags](#5-semantic-tags)
+6. [How They Work Together](#6-how-they-work-together)
+7. [Adding New Constructs](#7-adding-new-constructs)
+8. [Full Annotated Example](#8-full-annotated-example)
+
+---
+
+## 1. Top-Level Fields
+
+```json
+{
+ "language": "avap",
+ "version": "1.0",
+ "file_extensions": [".avap"]
+}
+```
+
+| Field | Type | Description |
+|---|---|---|
+| `language` | string | Human-readable language name. Used in chunker progress reports. |
+| `version` | string | Config schema version. Increment when making breaking changes. |
+| `file_extensions` | array of strings | File extensions the chunker will process. `.md` files are always processed regardless of this setting. |
+
+---
+
+## 2. Lexer
+
+The lexer section controls how raw source lines are stripped of comments and string literals before pattern matching is applied.
+
+```json
+"lexer": {
+ "string_delimiters": ["\"", "'"],
+ "escape_char": "\\",
+ "comment_line": ["///", "//"],
+ "comment_block": { "open": "/*", "close": "*/" },
+ "line_oriented": true
+}
+```
+
+| Field | Type | Description |
+|---|---|---|
+| `string_delimiters` | array of strings | Characters that open and close string literals. Content inside strings is ignored during pattern matching. |
+| `escape_char` | string | Character used to escape the next character inside a string. Prevents `\"` from closing the string. |
+| `comment_line` | array of strings | Line comment prefixes, evaluated longest-first. Everything after the matched prefix is stripped. AVAP supports both `///` (documentation comments) and `//` (inline comments). |
+| `comment_block.open` | string | Block comment opening delimiter. |
+| `comment_block.close` | string | Block comment closing delimiter. Content between `/*` and `*/` is stripped before pattern matching. |
+| `line_oriented` | bool | When `true`, the lexer processes one line at a time. Should always be `true` for AVAP. |
+
+**Important:** Comment stripping and string boundary detection happen before any block or statement pattern is evaluated. A keyword inside a string literal or a comment will never trigger a block or statement match.
+
+---
+
+## 3. Blocks
+
+Blocks are **multi-line constructs** with a defined opener and closer. The chunker tracks nesting depth — each opener increments depth, each closer decrements it, and the block ends when depth returns to zero. This correctly handles nested `if()` inside `function{}` and similar cases.
+
+Each block definition produces a chunk with `doc_type` as specified and `block_type` equal to the block `name`.
+
+```json
+"blocks": [
+ {
+ "name": "function",
+ "doc_type": "code",
+ "opener_pattern": "^\\s*function\\s+(\\w+)\\s*\\(([^)]*)",
+ "closer_pattern": "^\\s*\\}\\s*$",
+ "extract_signature": true,
+ "signature_template": "function {group1}({group2})"
+ },
+ ...
+]
+```
+
+### Block fields
+
+| Field | Type | Required | Description |
+|---|---|---|---|
+| `name` | string | Yes | Identifier for this block type. Used as `block_type` in the chunk metadata and in the `semantic_overlap` context header. |
+| `doc_type` | string | Yes | Elasticsearch `doc_type` field value for chunks from this block. |
+| `opener_pattern` | regex string | Yes | Pattern matched against the clean (comment-stripped) line to detect the start of this block. Must be anchored at the start (`^`). |
+| `closer_pattern` | regex string | Yes | Pattern matched to detect the end of this block. Checked at every line after the opener. |
+| `extract_signature` | bool | No (default: `false`) | When `true`, the chunker extracts a compact signature string from the opener line using capture groups, and creates an additional `function_signature` chunk alongside the full block chunk. |
+| `signature_template` | string | No | Template for the signature string. Uses `{group1}`, `{group2}`, etc. as placeholders for the regex capture groups from `opener_pattern`. |
+
+### Current block definitions
+
+#### `function`
+
+```
+opener: ^\\s*function\\s+(\\w+)\\s*\\(([^)]*)
+closer: ^\\s*\\}\\s*$
+```
+
+Matches any top-level or nested AVAP function declaration. The two capture groups extract the function name (`group1`) and parameter list (`group2`), which are combined into the signature template `function {group1}({group2})`.
+
+Because `extract_signature: true`, every function produces **two chunks**:
+1. A `doc_type: "code"`, `block_type: "function"` chunk containing the full function body.
+2. A `doc_type: "function_signature"`, `block_type: "function_signature"` chunk containing only the signature string (e.g. `function validateAccess(userId, token)`). This lightweight chunk is indexed separately to enable fast function-name lookup without retrieving the entire body.
+
+Additionally, the function signature is registered in the `SemanticOverlapBuffer`. Subsequent non-function chunks in the same file will receive the current function signature prepended as a context comment (`// contexto: function validateAccess(userId, token)`), keeping the surrounding code semantically grounded.
+
+#### `if`
+
+```
+opener: ^\\s*if\\s*\\(
+closer: ^\\s*end\\s*\\(\\s*\\)
+```
+
+Matches AVAP conditional blocks. Note: AVAP uses `end()` as the closer, not `}`.
+
+#### `startLoop`
+
+```
+opener: ^\\s*startLoop\\s*\\(
+closer: ^\\s*endLoop\\s*\\(\\s*\\)
+```
+
+Matches AVAP iteration blocks. The closer is `endLoop()`.
+
+#### `try`
+
+```
+opener: ^\\s*try\\s*\\(\\s*\\)
+closer: ^\\s*end\\s*\\(\\s*\\)
+```
+
+Matches AVAP error-handling blocks (`try()` … `end()`).
+
+---
+
+## 4. Statements
+
+Statements are **single-line constructs**. Lines that are not part of any block opener or closer are classified against the statement patterns in order. The first match wins. If no pattern matches, the statement is classified as `"statement"` (the fallback).
+
+Consecutive lines with the same statement type are **grouped into a single chunk**, keeping semantically related statements together. When the statement type changes, the current group is flushed as a chunk.
+
+```json
+"statements": [
+ { "name": "registerEndpoint", "pattern": "^\\s*registerEndpoint\\s*\\(" },
+ { "name": "addVar", "pattern": "^\\s*addVar\\s*\\(" },
+ ...
+]
+```
+
+### Statement fields
+
+| Field | Type | Description |
+|---|---|---|
+| `name` | string | Used as `block_type` in the chunk metadata. |
+| `pattern` | regex string | Matched against the clean line. First match wins — order matters. |
+
+### Current statement definitions
+
+| Name | Matches | AVAP commands |
+|---|---|---|
+| `registerEndpoint` | API route registration | `registerEndpoint(...)` |
+| `addVar` | Variable declaration | `addVar(...)` |
+| `io_command` | Input/output operations | `addParam`, `getListLen`, `addResult`, `getQueryParamList` |
+| `http_command` | HTTP client calls | `RequestPost`, `RequestGet` |
+| `orm_command` | Database ORM operations | `ormDirect`, `ormCheckTable`, `ormCreateTable`, `ormAccessSelect`, `ormAccessInsert`, `ormAccessUpdate` |
+| `util_command` | Utility and helper functions | `variableToList`, `itemFromList`, `variableFromJSON`, `AddVariableToJSON`, `encodeSHA256`, `encodeMD5`, `getRegex`, `getDateTime`, `stampToDatetime`, `getTimeStamp`, `randomString`, `replace` |
+| `async_command` | Concurrency primitives | `x = go funcName(`, `gather(` |
+| `connector` | External service connector | `x = avapConnector(` |
+| `modularity` | Module imports | `import`, `include` |
+| `assignment` | Variable assignment (catch-all before fallback) | `x = ...` |
+
+**Ordering note:** `registerEndpoint`, `addVar`, and the specific command categories are listed before `assignment` intentionally. `assignment` would match many of them (they all contain `=` or are function calls that could follow an assignment), so the more specific patterns must come first.
+
+---
+
+## 5. Semantic Tags
+
+Semantic tags are **boolean metadata flags** applied to every chunk (both blocks and statements) by scanning the entire chunk content with a regex. A chunk can have multiple tags simultaneously.
+
+The `complexity` field is automatically computed as the count of `true` tags in a chunk's metadata, providing a rough signal of how much AVAP functionality a given chunk exercises.
+
+```json
+"semantic_tags": [
+ { "tag": "uses_orm", "pattern": "\\b(ormDirect|ormAccessSelect|...)\\s*\\(" },
+ ...
+]
+```
+
+### Tag fields
+
+| Field | Description |
+|---|---|
+| `tag` | Key name in the `metadata` object stored in Elasticsearch. Value is always `true` when present. |
+| `pattern` | Regex searched (not matched) across the full chunk text. Uses `\b` word boundaries to avoid false positives. |
+
+### Current semantic tags
+
+| Tag | Detected when chunk contains |
+|---|---|
+| `uses_orm` | Any ORM command: `ormDirect`, `ormCheckTable`, `ormCreateTable`, `ormAccessSelect`, `ormAccessInsert`, `ormAccessUpdate` |
+| `uses_http` | HTTP client calls: `RequestPost`, `RequestGet` |
+| `uses_connector` | External connector: `avapConnector(` |
+| `uses_async` | Concurrency: `go funcName(` or `gather(` |
+| `uses_crypto` | Hashing/encoding: `encodeSHA256(`, `encodeMD5(` |
+| `uses_auth` | Auth-related commands: `addParam`, `_status` |
+| `uses_error_handling` | Error handling block: `try()` |
+| `uses_loop` | Loop construct: `startLoop(` |
+| `uses_json` | JSON operations: `variableFromJSON(`, `AddVariableToJSON(` |
+| `uses_list` | List operations: `variableToList(`, `itemFromList(`, `getListLen(` |
+| `uses_regex` | Regular expressions: `getRegex(` |
+| `uses_datetime` | Date/time operations: `getDateTime(`, `getTimeStamp(`, `stampToDatetime(` |
+| `returns_result` | Returns data to the API caller: `addResult(` |
+| `registers_endpoint` | Defines an API route: `registerEndpoint(` |
+
+**How tags are used at retrieval time:** The Elasticsearch mapping stores each tag as a `boolean` field under the `metadata` object. This enables filtered retrieval — for example, a future retrieval enhancement could boost chunks with `metadata.uses_orm: true` for queries that contain ORM-related keywords, improving precision for database-related questions.
+
+---
+
+## 6. How They Work Together
+
+The following example shows how `avap_chunker.py` processes a real `.avap` file using this config:
+
+```avap
+// Validate user session
+function validateAccess(userId, token) {
+ addVar(isValid = false)
+ addParam(userId)
+ try()
+ ormAccessSelect(users, id = userId)
+ addVar(isValid = true)
+ end()
+ addResult(isValid)
+}
+
+registerEndpoint(POST, /validate)
+```
+
+**Chunks produced:**
+
+| # | `doc_type` | `block_type` | Content | Tags |
+|---|---|---|---|---|
+| 1 | `code` | `function` | Full function body (lines 2–10) | `uses_auth`, `uses_orm`, `uses_error_handling`, `returns_result` · `complexity: 4` |
+| 2 | `function_signature` | `function_signature` | `function validateAccess(userId, token)` | — |
+| 3 | `code` | `registerEndpoint` | `registerEndpoint(POST, /validate)` | `registers_endpoint` · `complexity: 1` |
+
+Chunk 1 also receives the function signature as a semantic overlap header because the `SemanticOverlapBuffer` tracks `validateAccess` and injects it as context into any subsequent non-function chunks in the same file.
+
+---
+
+## 7. Adding New Constructs
+
+### Adding a new block type
+
+1. Identify the opener and closer patterns from the AVAP LRM (`docs/LRM/avap.md`).
+2. Add an entry to `"blocks"` in `avap_config.json`.
+3. If the block introduces a named construct worth indexing independently (like functions), set `"extract_signature": true` and define a `"signature_template"`.
+4. Run a smoke test on a representative `.avap` file:
+ ```bash
+ python scripts/pipelines/ingestion/avap_chunker.py \
+ --lang-config scripts/pipelines/ingestion/avap_config.json \
+ --docs-path docs/samples \
+ --output /tmp/test_chunks.jsonl \
+ --no-dedup
+ ```
+5. Inspect `/tmp/test_chunks.jsonl` and verify the new `block_type` appears with the expected content.
+6. Re-run the ingestion pipeline to rebuild the index.
+
+### Adding a new statement category
+
+1. Add an entry to `"statements"` **before** the `assignment` catch-all.
+2. Use `^\\s*` to anchor the pattern at the start of the line.
+3. Test as above — verify the new `block_type` appears in the JSONL output.
+
+### Adding a new semantic tag
+
+1. Add an entry to `"semantic_tags"`.
+2. Use `\\b` word boundaries to prevent false positives on substrings.
+3. Add the new tag as a `boolean` field to the Elasticsearch index mapping in `avap_ingestor.py` (`build_index_mapping()`).
+4. **Re-index from scratch** — existing documents will not have the new tag unless the index is rebuilt (`--delete` flag).
+
+---
+
+## 8. Full Annotated Example
+
+```jsonc
+{
+ // Identifies this config as the AVAP v1.0 grammar
+ "language": "avap",
+ "version": "1.0",
+ "file_extensions": [".avap"], // Only .avap files; .md is always included
+
+ "lexer": {
+ "string_delimiters": ["\"", "'"], // Both quote styles used in AVAP
+ "escape_char": "\\",
+ "comment_line": ["///", "//"], // /// first — longest match wins
+ "comment_block": { "open": "/*", "close": "*/" },
+ "line_oriented": true
+ },
+
+ "blocks": [
+ {
+ "name": "function",
+ "doc_type": "code",
+ // Captures: group1=name, group2=params
+ "opener_pattern": "^\\s*function\\s+(\\w+)\\s*\\(([^)]*)",
+ "closer_pattern": "^\\s*\\}\\s*$", // AVAP functions close with }
+ "extract_signature": true,
+ "signature_template": "function {group1}({group2})"
+ },
+ {
+ "name": "if",
+ "doc_type": "code",
+ "opener_pattern": "^\\s*if\\s*\\(",
+ "closer_pattern": "^\\s*end\\s*\\(\\s*\\)" // AVAP if closes with end()
+ },
+ {
+ "name": "startLoop",
+ "doc_type": "code",
+ "opener_pattern": "^\\s*startLoop\\s*\\(",
+ "closer_pattern": "^\\s*endLoop\\s*\\(\\s*\\)"
+ },
+ {
+ "name": "try",
+ "doc_type": "code",
+ "opener_pattern": "^\\s*try\\s*\\(\\s*\\)",
+ "closer_pattern": "^\\s*end\\s*\\(\\s*\\)" // try also closes with end()
+ }
+ ],
+
+ "statements": [
+ // Specific patterns first — must come before the generic "assignment" catch-all
+ { "name": "registerEndpoint", "pattern": "^\\s*registerEndpoint\\s*\\(" },
+ { "name": "addVar", "pattern": "^\\s*addVar\\s*\\(" },
+ { "name": "io_command", "pattern": "^\\s*(addParam|getListLen|addResult|getQueryParamList)\\s*\\(" },
+ { "name": "http_command", "pattern": "^\\s*(RequestPost|RequestGet)\\s*\\(" },
+ { "name": "orm_command", "pattern": "^\\s*(ormDirect|ormCheckTable|ormCreateTable|ormAccessSelect|ormAccessInsert|ormAccessUpdate)\\s*\\(" },
+ { "name": "util_command", "pattern": "^\\s*(variableToList|itemFromList|variableFromJSON|AddVariableToJSON|encodeSHA256|encodeMD5|getRegex|getDateTime|stampToDatetime|getTimeStamp|randomString|replace)\\s*\\(" },
+ { "name": "async_command", "pattern": "^\\s*(\\w+\\s*=\\s*go\\s+|gather\\s*\\()" },
+ { "name": "connector", "pattern": "^\\s*\\w+\\s*=\\s*avapConnector\\s*\\(" },
+ { "name": "modularity", "pattern": "^\\s*(import|include)\\s+" },
+ { "name": "assignment", "pattern": "^\\s*\\w+\\s*=\\s*" } // catch-all
+ ],
+
+ "semantic_tags": [
+ // Applied to every chunk by full-content regex search (not line-by-line)
+ { "tag": "uses_orm", "pattern": "\\b(ormDirect|ormCheckTable|ormCreateTable|ormAccessSelect|ormAccessInsert|ormAccessUpdate)\\s*\\(" },
+ { "tag": "uses_http", "pattern": "\\b(RequestPost|RequestGet)\\s*\\(" },
+ { "tag": "uses_connector", "pattern": "\\bavapConnector\\s*\\(" },
+ { "tag": "uses_async", "pattern": "\\bgo\\s+\\w+\\s*\\(|\\bgather\\s*\\(" },
+ { "tag": "uses_crypto", "pattern": "\\b(encodeSHA256|encodeMD5)\\s*\\(" },
+ { "tag": "uses_auth", "pattern": "\\b(addParam|_status)\\b" },
+ { "tag": "uses_error_handling", "pattern": "\\btry\\s*\\(\\s*\\)" },
+ { "tag": "uses_loop", "pattern": "\\bstartLoop\\s*\\(" },
+ { "tag": "uses_json", "pattern": "\\b(variableFromJSON|AddVariableToJSON)\\s*\\(" },
+ { "tag": "uses_list", "pattern": "\\b(variableToList|itemFromList|getListLen)\\s*\\(" },
+ { "tag": "uses_regex", "pattern": "\\bgetRegex\\s*\\(" },
+ { "tag": "uses_datetime", "pattern": "\\b(getDateTime|getTimeStamp|stampToDatetime)\\s*\\(" },
+ { "tag": "returns_result", "pattern": "\\baddResult\\s*\\(" },
+ { "tag": "registers_endpoint", "pattern": "\\bregisterEndpoint\\s*\\(" }
+ ]
+}
+```
diff --git a/docs/LRM/avap.md b/docs/LRM/avap.md
new file mode 100644
index 0000000..37950b5
--- /dev/null
+++ b/docs/LRM/avap.md
@@ -0,0 +1,1138 @@
+### Prefacio Arquitectónico
+
+**AVAP (Advanced Virtual API Programming) es un DSL (Domain-Specific Language) Turing Completo, diseñado arquitectónicamente para la orquestación segura, concurrente y determinista de microservicios e I/O.** No es un lenguaje de propósito general; su motor híbrido y su gramática estricta están optimizados para el procesamiento rápido de transacciones HTTP, la manipulación de datos en memoria y la persistencia, minimizando los efectos secundarios no deseados.
+
+---
+
+# Especificación Técnica Consolidada del Lenguaje AVAP (LRM)
+
+Este documento unifica la arquitectura de memoria, estructuras de control, modularidad, concurrencia asíncrona y la gramática formal (BNF) del lenguaje AVAP. Actúa como la única fuente de verdad (Single Source of Truth) para la implementación del parser, el motor de ejecución y la indexación del sistema RAG.
+
+---
+
+## SECCIÓN I: Arquitectura, Memoria y Fundamentos Estructurales
+
+Esta sección sienta las bases de cómo AVAP gestiona la lógica de los servicios y la manipulación de datos en memoria. A diferencia de los lenguajes interpretados convencionales, AVAP utiliza un motor de evaluación híbrida que permite combinar comandos declarativos con expresiones dinámicas.
+
+### 1.1 Estructura de Archivo y Terminación de Sentencias
+AVAP es un lenguaje **estrictamente orientado a líneas**. Esta decisión de diseño garantiza que el analizador sintáctico (parser) sea extremadamente rápido y determinista, evitando la ambigüedad que sufren lenguajes que permiten declaraciones en múltiples líneas.
+* Cada instrucción lógica (`statement`) debe completarse en una única línea física de texto.
+* El motor reconoce el salto de línea o retorno de carro (``) como el terminador absoluto de la instrucción.
+* No se admite la partición de una instrucción, obligando al programador a escribir un código secuencial, limpio y fácil de depurar.
+
+### 1.2 Registro de Endpoints (registerEndpoint)
+El comando `registerEndpoint` es la unidad atómica de configuración en AVAP. Actúa como el puente crítico entre la red externa (HTTP) y el código interno.
+* **Mecánica:** Define la ruta URL, el método HTTP permitido (ej. `GET`, `POST`), y la función de entrada principal (Handler).
+* **Seguridad:** El servidor AVAP rechazará automáticamente (con un Error 405) cualquier petición que no coincida con el método especificado.
+* **Middlewares:** Permite inyectar una lista de funciones previas para validar tokens antes de ejecutar el bloque principal.
+
+### 1.3 Asignación Dinámica y Referencias (addVar)
+AVAP permite una sintaxis de asignación directa mediante el símbolo `=`, otorgando flexibilidad bajo un estricto control de contexto.
+* **Evaluación en tiempo real:** Cuando el intérprete lee `variable = expresión`, resuelve cualquier operación matemática o lógica utilizando el motor de evaluación subyacente.
+* **El operador de desreferenciación (`$`):** Cuando se utiliza el comando nativo `addVar(copia, $original)`, el prefijo `$` indica al motor que debe buscar en la tabla de símbolos la variable llamada "original" y extraer su valor.
+* **Semántica de addVar:** El comando acepta `addVar(valor, variable)` o `addVar(variable, valor)`. Si ambos argumentos son identificadores, el valor del segundo se asigna al primero. No está permitido usar dos literales como argumentos.
+
+### Especificación BNF (Sección I)
+
+```bnf
+ ::= ( | )*
+ ::= [ ] [ | ]
+ | ( | )
+ ::= /* Retorno de carro / Salto de línea (\n o \r\n) */
+
+ ::=
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+
+ ::= "="
+
+/* Llamada a función global (sin receptor de objeto) */
+ ::= "(" [] ")"
+
+/* Llamada a método sobre un objeto conector (con receptor) */
+ ::= "=" "." "(" [] ")"
+
+ ::= |
+ ::= "registerEndpoint(" "," "," "," "," "," ")"
+/* addVar asigna un valor a una variable. Acepta (valor, variable) o (variable, valor).
+ Si ambos argumentos son identificadores, el valor del segundo se asigna al primero.
+ No está permitido pasar dos literales como argumentos. */
+ ::= "addVar(" "," ")"
+ ::= | | "$"
+/* Restricción semántica: al menos uno de los dos debe ser */
+
+ ::= [a-zA-Z_] [a-zA-Z0-9_]*
+
+/* Variables de sistema reservadas — accesibles y asignables desde cualquier scope:
+ _status — código HTTP de respuesta (ej. addVar(_status, 401) o _status = 404) */
+ ::= "_status"
+```
+
+---
+
+## SECCIÓN II: Gestión de Entrada y Salida (I/O)
+
+Esta sección describe los mecanismos que AVAP utiliza para la ingesta de datos externos, la validación de la integridad de los parámetros y la construcción del paquete de respuesta HTTP. AVAP no posee comandos de impresión interna (como `print`); toda salida de datos se realiza a través de la interfaz HTTP.
+
+### 2.1 Captura Inteligente de Parámetros (addParam)
+El comando `addParam(parametro, destino)` inspecciona la petición HTTP en un orden jerárquico estricto: primero en la URL (Query arguments), luego en el JSON Body, y finalmente en el Form Data. Si el parámetro solicitado no existe, la variable de destino se inicializa como `None`.
+
+### 2.2 Validación y Colecciones (getListLen / getQueryParamList)
+* **`getListLen(fuente, destino)`**: Actúa como un inspector de volumen. Cuenta cuántos elementos hay en una lista o cadena.
+* **`getQueryParamList(parametro, lista_destino)`**: Empaqueta automáticamente múltiples ocurrencias de un parámetro de URL (ej. `?filtro=A&filtro=B`) en una única estructura de lista.
+
+### 2.3 Construcción de Respuesta (addResult y _status)
+El comando `addResult(variable)` es el encargado de registrar qué variables formarán parte del cuerpo JSON de la respuesta final. La variable de sistema `_status` permite definir explícitamente el código HTTP de salida tanto mediante asignación directa (`_status = 404`) como mediante `addVar(_status, 401)`.
+
+### Especificación BNF (Sección II)
+
+```bnf
+ ::= | | |
+ ::= "addParam(" "," ")"
+ ::= "getListLen(" "," ")"
+ ::= "getQueryParamList(" "," ")"
+ ::= "addResult(" ")"
+```
+
+---
+
+## SECCIÓN III: Lógica de Control y Estructuras de Decisión
+
+AVAP utiliza una gramática estructural mixta. Combina la fluidez de las palabras clave para abrir bloques funcionales con la seguridad matemática de cierres estrictos.
+
+### 3.1 El Bloque Condicional (if() / else() / end())
+La estructura `if()` evalúa una expresión lógica o de comparación. Todo bloque condicional requiere un cierre explícito utilizando el comando `end()`.
+
+El comando `if()` soporta dos modos de invocación:
+* **Modo 1 (comparación estructurada):** `if(variable, valor, comparador)` — evalúa la comparación entre variable y valor usando el operador indicado como string (ej. `"=="`, `">"`, `"!="`). Los dos primeros argumentos deben ser identificadores simples o literales, nunca expresiones de acceso como `dict['clave']`. Si se necesita comparar un valor extraído de una estructura, debe asignarse primero a una variable.* **Modo 2 (expresión libre):** `if(None, None, expresion_compleja)` — evalúa directamente una expresión booleana compleja proporcionada como string encapsulado entre `.
+
+## SECCIÓN III: Lógica de Control y Estructuras de Decisión
+
+AVAP utiliza una gramática estructural mixta. Combina la fluidez de las palabras clave para abrir bloques funcionales con la seguridad matemática de cierres estrictos.
+
+### 3.1 El Bloque Condicional (if() / else() / end())
+El comando `if()` gestiona la lógica condicional mediante dos modos de invocación estrictamente diferenciados. Es imperativo respetar los delimitadores y la posición de los argumentos.
+
+#### Modo 1: Comparación Estructurada (Atómica)
+Se utiliza para comparaciones directas entre dos valores simples.
+* **Sintaxis:** `if(átomo_1, átomo_2, "operador")`
+* **Argumentos 1 y 2:** Deben ser identificadores simples (variables) o literales (strings/números). **No se permite el uso de `None` en este modo.**
+* **Argumento 3:** El operador de comparación debe ir obligatoriamente entre **comillas dobles** (`"=="`, `"!="`, `">"`, `"<"`, `">="`, `"<="`).
+* **Restricción:** No se permiten expresiones de acceso (ej. `data.user` o `list[0]`). Estos valores deben asignarse previamente a una variable.
+* **Ejemplo correcto:** `if(reintentos, 5, "<")`
+
+#### Modo 2: Expresión Libre (Evaluación Compleja)
+Se utiliza para evaluar expresiones lógicas que no encajan en la estructura atómica.
+* **Sintaxis:** `if(None, None, `expresión_compleja`)`
+* **Argumentos 1 y 2:** Deben ser literalmente la palabra `None` (sin comillas).
+* **Argumento 3:** La expresión completa **debe** estar encapsulada entre **acentos graves (backticks)**. Esto permite incluir lógica interna, operadores `and/or` y accesos a estructuras de datos.
+* **Ejemplo correcto:** `if(None, None, `user.id > 10 and email.contains("@")`)`
+
+---
+
+### Tabla de Validación para el Modelo
+
+| Entrada | Estado | Razón |
+| :--- | :--- | :--- |
+| `if(count, 10, "==")` | ✅ VÁLIDO | Modo 1: Átomos válidos y operador entre comillas. |
+| `if(None, None, `val > 0`)` | ✅ VÁLIDO | Modo 2: Uso correcto de `None` y backticks. |
+| `if(username, None, "==")` | ❌ ERROR | El Modo 1 prohíbe el uso de `None`. Debe usarse el Modo 2. |
+| `if(None, None, "val > 0")` | ❌ ERROR | El Modo 2 requiere backticks (`` ` ``), no comillas. |
+| `if(user.id, 10, "==")` | ❌ ERROR | El Modo 1 no permite expresiones de acceso (`.`). |
+
+### 3.2 Iteraciones Estrictas y Deterministas (startLoop / endLoop)
+Para garantizar el determinismo y evitar el colapso de memoria:
+* Los bucles se definen mediante `startLoop(contador, inicio, fin)`. Solo iteran basándose en índices numéricos finitos.
+* El bloque debe cerrarse obligatoriamente con `endLoop()`.
+* La forma de salir anticipadamente es invocando el comando global `return()`.
+
+### 3.3 Gestión de Errores en Tiempo de Ejecución (try() / exception() / end())
+Diseñada para proteger la estabilidad del servidor ante fallos de I/O.
+* Si ocurre un fallo del sistema dentro del bloque `try`, el flujo salta al bloque `exception(variable_error)`, poblando la variable con la traza para facilitar la recuperación del script.
+
+### Especificación BNF (Sección III)
+
+```bnf
+ ::= | |
+
+ ::= "if(" ")"
+
+ [ "else()" ]
+ "end()"
+
+ ::= |
+
+ ::= "if" "(" "," "," ")"
+ ::= "if" "(" "None" "," "None" "," ")"
+
+ ::= |
+ ::= "`" "`"
+
+ ::= [a-zA-Z_][a-zA-Z0-9_]*
+::= |
+/* Nota: NO incluye la palabra "None" */
+
+ ::= "startLoop(" "," "," ")"
+
+ "endLoop()"
+
+ ::= "try()"
+
+ "exception(" ")"
+
+ "end()"
+
+ ::= *
+```
+
+---
+
+## SECCIÓN IV: Concurrencia y Asincronía
+
+Implementa un sistema avanzado basado en hilos ligeros (gorutinas), permitiendo que el servidor procese operaciones de E/S largas sin bloquear el hilo principal.
+
+### 4.1 Comando Lanzador (go)
+* **Sintaxis:** `identificador = go nombre_funcion(parametros)`.
+* **Mecánica:** Crea un nuevo contexto de ejecución aislado. Devuelve un identificador único que debe guardarse para interactuar con el hilo posteriormente.
+
+### 4.2 Comando Sincronizador (gather)
+* **Sintaxis:** `resultado = gather(identificador, timeout)`.
+* **Mecánica:** Pausa el hilo principal esperando el resultado. Si se supera el `timeout` especificado, cancela la espera y devuelve `None`.
+
+### Especificación BNF (Sección IV)
+
+```bnf
+ ::= |
+ ::= "=" "go" "(" [] ")"
+ ::= "=" "gather(" ["," ] ")"
+```
+
+---
+
+## SECCIÓN V: Conectores de Terceros, Peticiones HTTP y ORM Nativo
+
+Agrupa todas las capacidades de interconexión hacia el exterior, permitiendo consumir integraciones de terceros, APIs externas y administrar bases de datos relacionales sin drivers adicionales.
+
+### 5.1 Conectores de Terceros (avapConnector)
+
+`avapConnector` es el mecanismo de integración con servicios de terceros configurados en la plataforma AVAP. Un conector se registra previamente mediante un UUID único. Al instanciarlo, la variable se convierte en un **objeto proxy** que encapsula credenciales y contexto, exponiendo métodos dinámicos mediante notación de punto.
+
+**Patrón de uso:**
+```avap
+// 1. Instanciar el conector usando su UUID
+belvo_connector = avapConnector("20908e93260147acb2636967021fbf5d")
+
+// 2. Invocar métodos dinámicos (resueltos en runtime)
+institutions = belvo_connector.list_institutions()
+balances = belvo_connector.get_balances(link, account_id)
+
+// 3. Resultado tratable como variable estándar
+addResult(balances)
+```
+
+### 5.2 Cliente HTTP Externo (RequestPost / RequestGet)
+
+Para evitar hilos bloqueados por latencia de red, AVAP exige un parámetro de **timeout** (en milisegundos). Si se supera, la variable destino recibe `None`.
+
+* **`RequestPost(url, querystring, headers, body, destino, timeout)`**: Ejecuta un POST almacenando la respuesta en `destino`.
+* **`RequestGet(url, querystring, headers, destino, timeout)`**: Ejecuta un GET omitiendo el cuerpo.
+
+### 5.3 Conector de Bases de Datos y ORM
+
+AVAP utiliza `avapConnector("TOKEN")` para la hidratación segura de credenciales. Las operaciones se ejecutan sobre una tabla específica definida por el parámetro `tableName`.
+
+* **`ormCheckTable(tableName, varTarget)`**: Verifica la existencia de una tabla en la base de datos conectada.
+* **`ormCreateTable(fields, fieldsType, tableName, varTarget)`**: Comando DDL para creación de tablas.
+* **`ormAccessSelect(fields, tableName, selector, varTarget)`**: Recupera registros. `fields` acepta `*` o lista de campos. El `selector` es la cláusula WHERE (puede estar vacío). Devuelve una lista de diccionarios.
+* **`ormAccessInsert(fieldsValues, tableName, varTarget)`**: Inserción parametrizada de registros en la tabla `tableName`.
+* **`ormAccessUpdate(fields, fieldsValues, tableName, selector, varTarget)`**: Modifica registros existentes. El `selector` es obligatorio para delimitar el alcance del cambio en la tabla `tableName`.
+* **`ormDirect(sentencia, destino)`**: Ejecución de SQL crudo para consultas analíticas complejas.
+
+
+
+---
+
+### Especificación BNF (Sección V)
+
+```bnf
+/* Instanciación de conector de terceros y llamada a sus métodos dinámicos */
+ ::= |
+ ::= "=" "avapConnector(" ")"
+ ::= [ "=" ] "." "(" [] ")"
+
+/* Cliente HTTP con Timeout Obligatorio */
+ ::= |
+ ::= "RequestPost(" "," "," "," "," "," ")"
+ ::= "RequestGet(" "," "," "," "," ")"
+
+/* ORM y Persistencia (Estandarizado con tableName) */
+ ::= | | | | |
+ ::= "ormDirect(" "," ")"
+ ::= "ormCheckTable(" "," ")"
+ ::= "ormCreateTable(" "," "," "," ")"
+
+/* ormAccessSelect(fields, tableName, selector, varTarget) */
+ ::= "ormAccessSelect(" "," "," [] "," ")"
+ ::= "*" |
+
+/* ormAccessInsert(fieldsValues, tableName, varTarget) */
+ ::= "ormAccessInsert(" "," "," ")"
+
+/* ormAccessUpdate(fields, fieldsValues, tableName, selector, varTarget) */
+ ::= "ormAccessUpdate(" "," "," "," "," ")"
+```
+
+> **Nota de implementación:** `` se distingue de `` (ORM) únicamente por contexto semántico: el UUID pasado como argumento determina si el adaptador resuelto es un ORM de base de datos o un proxy de terceros. La gramática los trata de forma idéntica; el motor de ejecución selecciona el adaptador apropiado en runtime.
+
+---
+
+# SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos
+
+AVAP incluye un set de comandos integrados de alto nivel para manipular tipos complejos (JSON y Listas), tiempos, textos y generar hashes.
+
+---
+
+## 6.1 Manipulación Nativa de Listas y Objetos JSON
+
+Para extraer y mutar estructuras complejas, AVAP provee comandos nativos específicos. En AVAP, las listas **no se instancian con literales de array**, sino que se construyen y recorren a través de un conjunto cerrado de comandos especializados:
+
+* **`variableToList(elemento, destino)`**: Fuerza a que una variable escalar se convierta en una estructura iterable de lista de un único elemento. Es el punto de entrada canónico para construir una lista desde cero a partir de un valor existente.
+
+* **`itemFromList(lista_origen, indice, destino)`**: Extrae de forma segura el elemento contenido en la posición `indice` (base 0) de una lista. Equivale a un acceso por índice controlado.
+
+* **`getListLen(lista, destino)`**: Calcula el número total de elementos contenidos en `lista` y almacena el resultado entero en `destino`. Imprescindible para construir bucles de recorrido seguro y para validar listas antes de acceder a sus índices. Se recomienda llamar siempre a `getListLen` antes de `itemFromList` para evitar accesos fuera de rango.
+
+* **`variableFromJSON(json_origen, clave, destino)`**: Parsea un objeto JSON en memoria y extrae el valor correspondiente a la `clave`, almacenándolo en `destino`. El acceso es directo por nombre de propiedad.
+
+* **`AddVariableToJSON(clave, valor, json_destino)`**: Inyecta dinámicamente una nueva propiedad dentro de un objeto JSON existente. Si la clave ya existe, su valor es sobreescrito.
+
+**Patrón de recorrido típico en AVAP:**
+
+```avap
+// 1. Obtener longitud de la lista
+getListLen(myList, len)
+
+// 2. Iterar con índice controlado
+i = 0
+while (i < len) {
+ itemFromList(myList, i, currentItem)
+ // ... procesar currentItem ...
+ i = i + 1
+}
+```
+
+---
+
+## 6.2 Criptografía y Expresiones Regulares
+
+* **`encodeSHA256(origen, destino)`** y **`encodeMD5(origen, destino)`**: Funciones criptográficas que encriptan de forma irreversible un texto. Vitales para el almacenamiento seguro de contraseñas y la verificación de integridad de datos. SHA-256 produce un digest de 64 caracteres hexadecimales y ofrece mayor resistencia criptográfica que MD5 (32 caracteres); se recomienda SHA-256 para nuevos desarrollos.
+
+* **`getRegex(origen, patron, destino)`**: Aplica una Expresión Regular (`patron`) sobre la variable de origen, extrayendo la primera coincidencia exacta encontrada. El patrón sigue la sintaxis estándar compatible con Python `re`.
+
+---
+
+## 6.3 Transformación de Tiempo y Cadenas
+
+### Fechas y Timestamps
+
+AVAP provee tres comandos complementarios para cubrir todas las conversiones posibles entre representaciones de tiempo. Los tres soportan formatos de calendario en notación `strftime` de Python y cálculos con `TimeDelta` expresados en segundos (positivo para sumar, negativo para restar):
+
+| Comando | Entrada | Salida |
+|---|---|---|
+| `getTimeStamp(fecha_string, formato, timedelta, destino)` | String de fecha | Epoch (entero) |
+| `stampToDatetime(epoch, formato, timedelta, destino)` | Epoch (entero) | String de fecha |
+| `getDateTime(formato, timedelta, zona_horaria, destino)` | — (ahora mismo) | String de fecha |
+
+* **`getTimeStamp(fecha_string, formato, timedelta, destino)`**: Convierte un string de fecha legible a su valor Epoch (entero Unix). Útil para almacenar fechas y realizar cálculos aritméticos sobre ellas.
+
+* **`stampToDatetime(epoch, formato, timedelta, destino)`**: Convierte un valor Epoch a un string de fecha con el formato especificado. Útil para presentar timestamps almacenados de forma legible.
+
+* **`getDateTime(formato, timedelta, zona_horaria, destino)`**: Captura la fecha y hora actuales del sistema, aplica el ajuste `timedelta` y las convierte a la `zona_horaria` indicada antes de almacenar el resultado. Acepta cualquier zona horaria reconocida por la librería `pytz` de Python.
+
+### Cadenas de Texto
+
+* **`randomString(patron, longitud, destino)`**: Genera una cadena aleatoria de `longitud` caracteres cuyos símbolos están restringidos al conjunto definido por `patron` (expresión regular de caracteres). Útil para generar tokens de sesión, contraseñas temporales o identificadores únicos.
+
+* **`replace(origen, patron_busqueda, reemplazo, destino)`**: Localiza todas las ocurrencias de `patron_busqueda` dentro de `origen` y las sustituye por `reemplazo`, almacenando el resultado en `destino`. Facilita el saneamiento y normalización de datos de entrada antes de su procesamiento o almacenamiento.
+
+---
+
+## BNF — Gramática Formal de los Comandos de Utilidad
+
+```bnf
+ ::= | |
+ | | | |
+
+/* Manipulación de listas y JSON */
+ ::= "variableToList(" "," ")"
+ | "itemFromList(" "," "," ")"
+ | "getListLen(" "," ")"
+ | "variableFromJSON(" "," "," ")"
+ | "AddVariableToJSON(" "," "," ")"
+
+/* Criptografía */
+ ::= "encodeSHA256(" "," ")"
+ | "encodeMD5(" "," ")"
+
+/* Expresiones regulares */
+ ::= "getRegex(" "," "," ")"
+
+/* Fecha/hora actual -> string */
+ ::= "getDateTime(" "," "," "," ")"
+/* Argumentos: formato_salida, timedelta, zona_horaria, destino */
+
+/* Conversiones epoch ↔ string */
+ ::= "stampToDatetime(" "," "," "," ")"
+/* Argumentos: epoch_origen, formato, timedelta, destino */
+ | "getTimeStamp(" "," "," "," ")"
+/* Argumentos: fecha_string, formato_entrada, timedelta, destino */
+
+/* Cadenas */
+ ::= "randomString(" "," "," ")"
+/* Argumentos: patron, longitud, destino */
+
+ ::= "replace(" "," "," "," ")"
+/* Argumentos: origen, patron_busqueda, reemplazo, destino */
+```
+
+
+---
+
+## SECCIÓN VII: Arquitectura de Funciones y Ámbitos (Scopes)
+
+Las funciones son recintos herméticos de memoria. Al entrar en una función, AVAP crea un nuevo diccionario de variables locales aislado del contexto global.
+El comando `return()` actúa como interruptor de flujo: inyecta el valor calculado al llamador, libera la memoria local, y si se usa dentro de un `startLoop`, rompe la iteración anticipadamente.
+
+### Especificación BNF (Sección VII)
+
+```bnf
+/* Nota: las funciones utilizan llaves {} como delimitadores de bloque por decisión
+ arquitectónica explícita, diferenciándose de las estructuras de control (if, loop, try)
+ que usan palabras clave de cierre (end(), endLoop()). Ambos patrones coexisten
+ en la gramática y el parser los distingue por el token de apertura. */
+ ::= "function" "(" [