assistance-engine/changelog

# Changelog

All notable changes to the **Brunix Assistance Engine** will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

---
## [1.6.3] - 2026-03-26
### Changed
- RESEARCH: updated ADR-0005-embedding-model-selection, BEIR benchmark results (qwen3-emb vs bge-m3) added.

### Added
- FEATURE: added `research/embeddings/evaluate_embeddings_pipeline.py` in order to facilitate model embedding evaluation with BEIR benchmarks.
- RESULTS: added qwen3-emb vs bge-m3 scores in BEIR benchmark.

## [1.6.2] - 2026-03-26
### Changed
- RESEARCH: updated `embeddings/Embedding model selection.pdf`.


## [1.6.1] - 2026-03-20

### Added
- FEATURE (PRD-0002): Extended `AgentRequest` in `brunix.proto` with four optional fields: `editor_content` (field 3), `selected_text` (field 4), `extra_context` (field 5), `user_info` (field 6) — enabling the VS Code extension to send active file content, selected code, free-form context, and client identity metadata alongside every query. Fields 3–5 are Base64-encoded; field 6 is a JSON string.
- FEATURE (PRD-0002): Extended `AgentState` with `editor_content`, `selected_text`, `extra_context`, `user_info`, and `use_editor_context` fields.
- FEATURE (PRD-0002): Extended classifier (`CLASSIFY_PROMPT_TEMPLATE`) to output two tokens — query type and editor signal (`EDITOR` / `NO_EDITOR`). `use_editor_context` flag set in state based on classifier output.
- FEATURE (PRD-0002): Editor context injected into generation prompt only when `use_editor_context=True` — prevents the model from referencing editor code when the question is unrelated.
- FEATURE (PRD-0002): `openai_proxy.py` — parses the standard OpenAI `user` field as a JSON string to extract `editor_content`, `selected_text`, `extra_context`, and `user_info`. Non-Brunix clients that send `user` as a plain string or omit it are handled gracefully with no error.
- FEATURE (PRD-0002): `server.py` — Base64 decoding of `editor_content`, `selected_text`, and `extra_context` on request arrival. Malformed Base64 is silently treated as empty string.
- TESTS: Added `Docker/tests/test_prd_0002.py` — 40 unit tests covering `_parse_query_type`, `_decode_b64`, `_parse_editor_context`, `_build_reformulate_query`, editor context injection logic, and all valid classifier output combinations. Runs without external dependencies (no Elasticsearch, no Ollama, no gRPC server required).
- DOCS: Added `docs/product/PRD-0001-openai-compatible-proxy.md` — product requirements document for the OpenAI-compatible HTTP proxy.
- DOCS: Added `docs/product/PRD-0002-editor-context-injection.md` — product requirements document for editor context injection (updated to Implemented status with full technical design).
- DOCS: Added `docs/ADR/ADR-0005-embedding-model-selection.md` — comparative evaluation of BGE-M3 vs Qwen3-Embedding-0.6B. Status: Under Evaluation.
- DOCS: Added `LICENSE` — proprietary license, 101OBEX, Corp, Delaware.
- DOCS: Added `research/` directory structure for MrHouston experiment results and benchmarks.

### Changed
- FEATURE (PRD-0002): `session_id` in `openai_proxy.py` is now read exclusively from the dedicated `session_id` field — no longer falls back to the `user` field. Breaking change for any client that was using `user` as a `session_id` fallback.
- ENGINE: `CLASSIFY_PROMPT_TEMPLATE` extended with `<editor_rule>` and updated `<output_rule>` for two-token output format.
- ENGINE: `REFORMULATE_PROMPT` extended with `<mode_rule>` and `<language_rule>` — the reformulator now receives `[MODE: X]` prepended to the query and applies command expansion only in `CODE_GENERATION` mode.
- ENGINE: `GENERATE_PROMPT` — added "Answer in the same language the user used" to `<output_format>`. Fixes responses defaulting to English for Spanish queries.
- ENGINE: `hybrid_search_native` in `graph.py` — BM25 query now uses a `bool` query with `should` boost for `doc_type: spec` and `doc_type: narrative` chunks, improving retrieval of definitional and explanatory content over raw code examples.
- DOCS: Updated `docs/API_REFERENCE.md` — full `AgentRequest` table with all 6 fields, Base64 encoding notes, editor context behaviour section, and updated proxy examples.
- DOCS: Updated `docs/ARCHITECTURE.md` — new §6 Editor Context Pipeline, updated §4 LangGraph Workflow with two-token classifier, §4.6 reformulator mode-aware and language-preserving, updated component inventory and request lifecycle diagrams.
- DOCS: Updated `README.md` — project structure with `Docker/tests/`, `docs/product/`, `docs/ADR/ADR-0005`, `research/`, `LICENSE`. HTTP proxy section updated with editor context curl examples. Documentation index updated.
- DOCS: Updated `CONTRIBUTING.md` — added Section 10 (PRDs), Section 11 (Research & Experiments Policy), updated PR checklist, ADR table with ADR-0005.
- DOCS: Updated `docs/AVAP_CHUNKER_CONFIG.md` to v2.0 — five new commands (else, end, endLoop, exception, return), naming fix (AddvariableToJSON), nine dual assignment patterns, four new semantic tags.
- GOVERNANCE: Updated `.github/CODEOWNERS` — added `@BRUNIX-AI/engineering` and `@BRUNIX-AI/research` teams, explicit rules for proto, golden dataset, grammar config, ADRs and PRDs.

### Fixed
- ENGINE: Fixed retrieval returning wrong chunks for Spanish definition queries — reformulator was translating Spanish queries to English, breaking BM25 lexical matching against Spanish LRM chunks. Root cause: missing language preservation rule in `REFORMULATE_PROMPT`.
- ENGINE: Fixed reformulator applying CODE_GENERATION command expansion to RETRIEVAL queries — caused "Que significa AVAP?" to reformulate as "AVAP registerEndpoint addResult _status". Root cause: reformulator had no awareness of query type. Fix: `[MODE: X]` prefix + mode-aware rules.
- ENGINE: Fixed responses defaulting to English regardless of query language. Root cause: `GENERATE_PROMPT` had no language instruction (unlike `CODE_GENERATION_PROMPT` which already had it).

---

## [1.6.0] - 2026-03-18

### Added
- ENGINE: Added `AskAgentStream` RPC — real token-by-token streaming directly from Ollama. Two-phase design: classify + reformulate + retrieve runs first via `build_prepare_graph`, then `llm.stream()` forwards tokens to the client as they arrive.
- ENGINE: Added `EvaluateRAG` RPC — RAGAS evaluation pipeline with Claude as judge. Runs faithfulness, answer_relevancy, context_recall and context_precision against a golden dataset and returns a global score with verdict (EXCELLENT / ACCEPTABLE / INSUFFICIENT).
- ENGINE: Added `openai_proxy.py` — OpenAI and Ollama compatible HTTP API running on port 8000. Routes `stream: false` to `AskAgent` and `stream: true` to `AskAgentStream`. Endpoints: `POST /v1/chat/completions`, `POST /v1/completions`, `GET /v1/models`, `POST /api/chat`, `POST /api/generate`, `GET /api/tags`, `GET /health`.
- ENGINE: Added `entrypoint.sh` — starts gRPC server and HTTP proxy as parallel processes with mutual watchdog. If either crashes, the container stops cleanly.
- ENGINE: Added session memory — `session_store` dict indexed by `session_id` accumulates full conversation history per session. Each request loads and persists history.
- ENGINE: Added query intent classifier — LangGraph node that classifies every query as `RETRIEVAL`, `CODE_GENERATION` or `CONVERSATIONAL` and routes to the appropriate subgraph.
- ENGINE: Added hybrid retrieval — replaced `ElasticsearchStore` (LangChain abstraction) with native Elasticsearch client. Each query runs BM25 `multi_match` and kNN in parallel, fused with Reciprocal Rank Fusion (k=60). Returns top-8 documents.
- ENGINE: Added `evaluate.py` — full RAGAS evaluation pipeline using the same hybrid retrieval as production, Claude as external judge, and the golden dataset in `Docker/src/golden_dataset.json`.
- PROTO: Added `AskAgentStream` and `EvaluateRAG` RPCs to `brunix.proto` with their message types (`EvalRequest`, `EvalResponse`, `QuestionDetail`).
- DOCS: Added `docs/ADR/ADR-0001-grpc-primary-interface.md`.
- DOCS: Added `docs/ADR/ADR-0002-two-phase-streaming.md`.
- DOCS: Added `docs/ADR/ADR-0003-hybrid-retrieval-rrf.md`.
- DOCS: Added `docs/ADR/ADR-0004-claude-eval-judge.md`.
- DOCS: Added `docs/samples/` — 30 representative `.avap` code samples covering all AVAP constructs.

### Changed
- ENGINE: Replaced `ElasticsearchStore` with native Elasticsearch client — fixes silent kNN failure caused by schema incompatibility between the Chonkie ingestion pipeline and the LangChain-managed index schema.
- ENGINE: Replaced single `GENERATE_PROMPT` with five specialised prompts — `CLASSIFY_PROMPT`, `REFORMULATE_PROMPT`, `GENERATE_PROMPT`, `CODE_GENERATION_PROMPT`, `CONVERSATIONAL_PROMPT` — each optimised for its routing path.
- ENGINE: Extended `REFORMULATE_PROMPT` with explicit AVAP command mapping — intent-to-command expansion for API, database, HTTP, loop and error handling query types.
- ENGINE: Extended `AgentState` with `query_type` and `session_id` fields required for conditional routing and session persistence.
- ENGINE: Fixed `session_id` ignored — `graph.invoke` now passes `session_id` into the graph state.
- ENGINE: Fixed double `is_final: True` — `AskAgent` previously emitted two closing messages. Now emits exactly one.
- ENGINE: Fixed embedding endpoint mismatch — server now uses the same `/api/embed` endpoint and payload format as both ingestion pipelines, ensuring vectors are comparable at query time.
- DEPENDENCIES: `requirements.txt` updated — added `ragas`, `datasets`, `langchain-anthropic`, `fastapi`, `uvicorn`.

### Fixed
- ENGINE: Fixed retrieval returning zero results — `ElasticsearchStore` assumed a LangChain-managed schema incompatible with the Chonkie-generated index. Replaced with native ES client querying actual field names.
- ENGINE: Fixed context always empty — consequence of the retrieval bug above. The generation prompt received an empty `{context}` on every request and always returned the fallback string.

---

## [1.5.1] - 2026-03-18

### Added
- DOCS: Created `docs/ARCHITECTURE.md` — full technical architecture reference covering component inventory, request lifecycle, LangGraph workflow, hybrid RAG pipeline, streaming design, evaluation pipeline, infrastructure layout, session memory, observability, and security boundaries.
- DOCS: Created `docs/API_REFERENCE.md` — complete gRPC API contract documentation with method descriptions, message type tables, error handling, and `grpcurl` client examples for all three RPCs (`AskAgent`, `AskAgentStream`, `EvaluateRAG`).
- DOCS: Created `docs/RUNBOOK.md` — operational playbook with health checks, startup/shutdown procedures, tunnel management, and incident playbooks for all known failure modes.
- DOCS: Created `SECURITY.md` — security policy covering transport security, authentication, secrets management, container security, data privacy, known limitations table, and vulnerability reporting process.
- DOCS: Created `docs/AVAP_CHUNKER_CONFIG.md` — full reference for `avap_config.json`: lexer fields, all 4 block definitions with regex breakdown, all 10 statement categories with ordering rationale, all 14 semantic tags with detection patterns, a worked example showing chunks produced from real AVAP code, and a step-by-step guide for adding new constructs.

### Changed
- DOCS: Fully rewrote `README.md` project structure tree — now reflects all files accurately including `openai_proxy.py`, `entrypoint.sh`, `golden_dataset.json`, `SECURITY.md`, `docs/ARCHITECTURE.md`, `docs/API_REFERENCE.md`, `docs/RUNBOOK.md`, `docs/adr/`, `avap_chunker.py`, `avap_config.json`, `ingestion/chunks.jsonl`, and `src/config.py`.
- DOCS: Added `Knowledge Base Ingestion` section to `README.md` documenting both ingestion pipelines in full: Pipeline A (Chonkie — `elasticsearch_ingestion.py`) with flow diagram, CLI usage, and chunk field table; Pipeline B (AVAP Native — `avap_chunker.py` + `avap_ingestor.py`) with flow diagram, chunk type table, semantic tags reference, and ingestor env vars.
- DOCS: Replaced minimal `Testing & Debugging` section with complete documentation of all three gRPC methods (`AskAgent`, `AskAgentStream`, `EvaluateRAG`) including expected responses, multi-turn example, and verdict thresholds.
- DOCS: Added `HTTP Proxy` section documenting all 7 HTTP endpoints (4 OpenAI + 3 Ollama), streaming vs non-streaming routing, `session_id` extension, and proxy env vars table.
- DOCS: Fixed `API Contract (Protobuf)` section — corrected `grpc_tools.protoc` paths and added reference to `docs/API_REFERENCE.md`.
- DOCS: Fixed remaining stale reference to `scripts/generate_mbpp_avap.py` in Dataset Generation section.
- DOCS: Added Documentation Index table to `README.md` linking all documentation files.
- DOCS: Updated `CONTRIBUTING.md` — added Section 9 (Architecture Decision Records) and updated PR checklist and doc policy table.
- ENV: Added missing variable documentation to `README.md`: `ELASTICSEARCH_USER`, `ELASTICSEARCH_PASSWORD`, `ELASTICSEARCH_API_KEY`, `ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL`.

---

## [1.5.0] - 2026-03-12

### Added
- IMPLEMENTED:
    - `scripts/pipelines/flows/translate_mbpp.py`: pipeline to generate synthetic dataset from mbpp dataset.
    - `scripts/tasks/prompts.py`: module containing prompts for pipelines.
    - `scripts/tasks/chunk.py`: module containing functions related to chunk management.
    - `synthetic_datasets`: folder containing generated synthetic datasets.
    - `src/config.py`: environment variables configuration file.

### Changed
- REFACTORED: `scripts/pipelines/flows/elasticsearch_ingestion.py` now uses `docs/LRM` or `docs/samples` documents instead of pre chunked files.
- RENAMED `docs/AVAP Language: Core Commands & Functional Specification` to `docs/avap_language_github_docs`.
- REMOVED: `Makefile` file.
- REMOVED: `scripts/start-tunnels.sh` script.
- DEPENDENCIES: `requirements.txt` updated with new libraries required by the new modules.
- MOVED `scripts/generate_mbap.py` into `scripts/flows/generate_mbap.py`.


## [1.4.0] - 2026-03-10

### Added
- **Dataset Generation Suite**: Added `scripts/generate_mbpp_avap.py` to automate the creation of synthetic AVAP training data.
- **MBPP-style Benchmarking**: Support for generating structured JSON datasets with code solutions and Python-based validation tests (`test_list`).
- **LRM Integration**: The generator now performs grounded synthesis using the `avap.md` Language Reference Manual.
- **Anthropic Claude 3.5 Sonnet Integration**: Orchestration logic for high-fidelity code generation via API.

### Changed
- **README.md**: Added comprehensive documentation for the Evaluation & Dataset Generation pipeline.
- **Project Structure**: Integrated `evaluation/` directory for synthetic dataset storage.

### Security
- Added explicit policy to avoid committing real Anthropic API keys, enforcing the use of environment variables.

## [1.3.0] - 2026-03-05

### Added
- IMPLEMENTED:
    - `Docker/src/utils/emb_factory`: factory modules created for embedding model generation.
    - `Docker/src/utils/llm_factory`: factory modules created for LLM generation.
    - `Docker/src/graph.py`: workflow graph orchestration module added.
    - `Docker/src/prompts.py`: centralized prompt definitions added.
    - `Docker/src/state.py`: shared state management module added.
    - `scripts/pipelines/flows/elasticsearch_ingestion.py`: pipeline to populate the elasticsearch vector database.
    - `ingestion/docs`: folder containing all chunked AVAP documents.

### Changed
- REFACTORED: `server.py` updated to integrate the new graph/state/prompt and utils-based architecture.
- REFACTORED: `docker-compose.yaml` now uses fully parameterized environment variables instead of hardcoded service URLs and credentials.
- DEPENDENCIES: `requirements.txt` updated with new libraries required by the new modules.


## [1.2.0] - 2026-03-03

### Added
- GOVERNANCE: Introduced `CONTRIBUTING.md` as the single source of truth for all contribution standards, covering GitFlow, infrastructure policy, repository standards, environment variables, changelog, documentation, and incident reporting.
- GOVERNANCE: Added `.github/pull_request_template.md` enforcing a mandatory structured checklist on every PR — including explicit sign-off on environment variables, changelog, and documentation.
- DOCS: Added Environment Variables reference table to `README.md`. All variables must be registered here. PRs introducing undocumented variables will be rejected.
- DOCS: Updated project structure map in `README.md` to reflect new governance files.

### Changed
- PROCESS: Pull Requests that introduce new environment variables without documentation, omit required changelog entries, or skip required documentation updates are now formally non-mergeable per `CONTRIBUTING.md`.

---

## [1.1.0] - 2026-02-16

### Added
- IMPLEMENTED: Strict repository structure enforcement to separate development environment from production runtime.
- SECURITY: Added `.dockerignore` to prevent leaking sensitive source files and local configurations into the container.

### Changed
- REFACTORED: Dockerfile build logic to optimize build context and reduce image footprint.
- ARCHITECTURE: Moved application entry point to `/app` and eliminated the redundant root `/workspace` directory for enhanced security.

### Fixed
- RESOLVED: Issue where non-production files were being bundled into the Docker image, improving deployment speed and container isolation.

---

## [1.0.0] - 2026-02-09

### Added
- **System Architecture:** Implementation of the triple-layer stack (Engine, Vector DB, Observability).
- **Core Engine:** Deployment of the `brunix-assistance-engine` using **Python 3.11**, **LangChain**, and **LangGraph** for agentic workflows.
- **Communication Layer:** Established **gRPC** as the primary high-performance interface (Port 50051/50052).
- **Knowledge Base:** Integration of **Elasticsearch 8.12** (`brunix-vector-db`) for AVAP technology RAG support.
- **Observability Framework:** Deployment of **Langfuse** and **PostgreSQL** for full trace audit and cost management.
- **Security:** Initial network isolation within Docker (`avap-network`) and production-ready secret management design.