assistance-engine/changelog

201 lines
18 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Changelog
All notable changes to the **Brunix Assistance Engine** will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
---
## [1.6.3] - 2026-03-26
### Changed
- RESEARCH: updated ADR-0005-embedding-model-selection, BEIR benchmark results (qwen3-emb vs bge-m3) added.
### Added
- FEATURE: added `research/embeddings/evaluate_embeddings_pipeline.py` in order to facilitate model embedding evaluation with BEIR benchmarks.
- RESULTS: added qwen3-emb vs bge-m3 scores in BEIR benchmark.
## [1.6.2] - 2026-03-26
### Changed
- RESEARCH: updated `embeddings/Embedding model selection.pdf`.
## [1.6.1] - 2026-03-20
### Added
- FEATURE (PRD-0002): Extended `AgentRequest` in `brunix.proto` with four optional fields: `editor_content` (field 3), `selected_text` (field 4), `extra_context` (field 5), `user_info` (field 6) — enabling the VS Code extension to send active file content, selected code, free-form context, and client identity metadata alongside every query. Fields 35 are Base64-encoded; field 6 is a JSON string.
- FEATURE (PRD-0002): Extended `AgentState` with `editor_content`, `selected_text`, `extra_context`, `user_info`, and `use_editor_context` fields.
- FEATURE (PRD-0002): Extended classifier (`CLASSIFY_PROMPT_TEMPLATE`) to output two tokens — query type and editor signal (`EDITOR` / `NO_EDITOR`). `use_editor_context` flag set in state based on classifier output.
- FEATURE (PRD-0002): Editor context injected into generation prompt only when `use_editor_context=True` — prevents the model from referencing editor code when the question is unrelated.
- FEATURE (PRD-0002): `openai_proxy.py` — parses the standard OpenAI `user` field as a JSON string to extract `editor_content`, `selected_text`, `extra_context`, and `user_info`. Non-Brunix clients that send `user` as a plain string or omit it are handled gracefully with no error.
- FEATURE (PRD-0002): `server.py` — Base64 decoding of `editor_content`, `selected_text`, and `extra_context` on request arrival. Malformed Base64 is silently treated as empty string.
- TESTS: Added `Docker/tests/test_prd_0002.py` — 40 unit tests covering `_parse_query_type`, `_decode_b64`, `_parse_editor_context`, `_build_reformulate_query`, editor context injection logic, and all valid classifier output combinations. Runs without external dependencies (no Elasticsearch, no Ollama, no gRPC server required).
- DOCS: Added `docs/product/PRD-0001-openai-compatible-proxy.md` — product requirements document for the OpenAI-compatible HTTP proxy.
- DOCS: Added `docs/product/PRD-0002-editor-context-injection.md` — product requirements document for editor context injection (updated to Implemented status with full technical design).
- DOCS: Added `docs/ADR/ADR-0005-embedding-model-selection.md` — comparative evaluation of BGE-M3 vs Qwen3-Embedding-0.6B. Status: Under Evaluation.
- DOCS: Added `LICENSE` — proprietary license, 101OBEX, Corp, Delaware.
- DOCS: Added `research/` directory structure for MrHouston experiment results and benchmarks.
### Changed
- FEATURE (PRD-0002): `session_id` in `openai_proxy.py` is now read exclusively from the dedicated `session_id` field — no longer falls back to the `user` field. Breaking change for any client that was using `user` as a `session_id` fallback.
- ENGINE: `CLASSIFY_PROMPT_TEMPLATE` extended with `<editor_rule>` and updated `<output_rule>` for two-token output format.
- ENGINE: `REFORMULATE_PROMPT` extended with `<mode_rule>` and `<language_rule>` — the reformulator now receives `[MODE: X]` prepended to the query and applies command expansion only in `CODE_GENERATION` mode.
- ENGINE: `GENERATE_PROMPT` — added "Answer in the same language the user used" to `<output_format>`. Fixes responses defaulting to English for Spanish queries.
- ENGINE: `hybrid_search_native` in `graph.py` — BM25 query now uses a `bool` query with `should` boost for `doc_type: spec` and `doc_type: narrative` chunks, improving retrieval of definitional and explanatory content over raw code examples.
- DOCS: Updated `docs/API_REFERENCE.md` — full `AgentRequest` table with all 6 fields, Base64 encoding notes, editor context behaviour section, and updated proxy examples.
- DOCS: Updated `docs/ARCHITECTURE.md` — new §6 Editor Context Pipeline, updated §4 LangGraph Workflow with two-token classifier, §4.6 reformulator mode-aware and language-preserving, updated component inventory and request lifecycle diagrams.
- DOCS: Updated `README.md` — project structure with `Docker/tests/`, `docs/product/`, `docs/ADR/ADR-0005`, `research/`, `LICENSE`. HTTP proxy section updated with editor context curl examples. Documentation index updated.
- DOCS: Updated `CONTRIBUTING.md` — added Section 10 (PRDs), Section 11 (Research & Experiments Policy), updated PR checklist, ADR table with ADR-0005.
- DOCS: Updated `docs/AVAP_CHUNKER_CONFIG.md` to v2.0 — five new commands (else, end, endLoop, exception, return), naming fix (AddvariableToJSON), nine dual assignment patterns, four new semantic tags.
- GOVERNANCE: Updated `.github/CODEOWNERS` — added `@BRUNIX-AI/engineering` and `@BRUNIX-AI/research` teams, explicit rules for proto, golden dataset, grammar config, ADRs and PRDs.
### Fixed
- ENGINE: Fixed retrieval returning wrong chunks for Spanish definition queries — reformulator was translating Spanish queries to English, breaking BM25 lexical matching against Spanish LRM chunks. Root cause: missing language preservation rule in `REFORMULATE_PROMPT`.
- ENGINE: Fixed reformulator applying CODE_GENERATION command expansion to RETRIEVAL queries — caused "Que significa AVAP?" to reformulate as "AVAP registerEndpoint addResult _status". Root cause: reformulator had no awareness of query type. Fix: `[MODE: X]` prefix + mode-aware rules.
- ENGINE: Fixed responses defaulting to English regardless of query language. Root cause: `GENERATE_PROMPT` had no language instruction (unlike `CODE_GENERATION_PROMPT` which already had it).
---
## [1.6.0] - 2026-03-18
### Added
- ENGINE: Added `AskAgentStream` RPC — real token-by-token streaming directly from Ollama. Two-phase design: classify + reformulate + retrieve runs first via `build_prepare_graph`, then `llm.stream()` forwards tokens to the client as they arrive.
- ENGINE: Added `EvaluateRAG` RPC — RAGAS evaluation pipeline with Claude as judge. Runs faithfulness, answer_relevancy, context_recall and context_precision against a golden dataset and returns a global score with verdict (EXCELLENT / ACCEPTABLE / INSUFFICIENT).
- ENGINE: Added `openai_proxy.py` — OpenAI and Ollama compatible HTTP API running on port 8000. Routes `stream: false` to `AskAgent` and `stream: true` to `AskAgentStream`. Endpoints: `POST /v1/chat/completions`, `POST /v1/completions`, `GET /v1/models`, `POST /api/chat`, `POST /api/generate`, `GET /api/tags`, `GET /health`.
- ENGINE: Added `entrypoint.sh` — starts gRPC server and HTTP proxy as parallel processes with mutual watchdog. If either crashes, the container stops cleanly.
- ENGINE: Added session memory — `session_store` dict indexed by `session_id` accumulates full conversation history per session. Each request loads and persists history.
- ENGINE: Added query intent classifier — LangGraph node that classifies every query as `RETRIEVAL`, `CODE_GENERATION` or `CONVERSATIONAL` and routes to the appropriate subgraph.
- ENGINE: Added hybrid retrieval — replaced `ElasticsearchStore` (LangChain abstraction) with native Elasticsearch client. Each query runs BM25 `multi_match` and kNN in parallel, fused with Reciprocal Rank Fusion (k=60). Returns top-8 documents.
- ENGINE: Added `evaluate.py` — full RAGAS evaluation pipeline using the same hybrid retrieval as production, Claude as external judge, and the golden dataset in `Docker/src/golden_dataset.json`.
- PROTO: Added `AskAgentStream` and `EvaluateRAG` RPCs to `brunix.proto` with their message types (`EvalRequest`, `EvalResponse`, `QuestionDetail`).
- DOCS: Added `docs/ADR/ADR-0001-grpc-primary-interface.md`.
- DOCS: Added `docs/ADR/ADR-0002-two-phase-streaming.md`.
- DOCS: Added `docs/ADR/ADR-0003-hybrid-retrieval-rrf.md`.
- DOCS: Added `docs/ADR/ADR-0004-claude-eval-judge.md`.
- DOCS: Added `docs/samples/` — 30 representative `.avap` code samples covering all AVAP constructs.
### Changed
- ENGINE: Replaced `ElasticsearchStore` with native Elasticsearch client — fixes silent kNN failure caused by schema incompatibility between the Chonkie ingestion pipeline and the LangChain-managed index schema.
- ENGINE: Replaced single `GENERATE_PROMPT` with five specialised prompts — `CLASSIFY_PROMPT`, `REFORMULATE_PROMPT`, `GENERATE_PROMPT`, `CODE_GENERATION_PROMPT`, `CONVERSATIONAL_PROMPT` — each optimised for its routing path.
- ENGINE: Extended `REFORMULATE_PROMPT` with explicit AVAP command mapping — intent-to-command expansion for API, database, HTTP, loop and error handling query types.
- ENGINE: Extended `AgentState` with `query_type` and `session_id` fields required for conditional routing and session persistence.
- ENGINE: Fixed `session_id` ignored — `graph.invoke` now passes `session_id` into the graph state.
- ENGINE: Fixed double `is_final: True` — `AskAgent` previously emitted two closing messages. Now emits exactly one.
- ENGINE: Fixed embedding endpoint mismatch — server now uses the same `/api/embed` endpoint and payload format as both ingestion pipelines, ensuring vectors are comparable at query time.
- DEPENDENCIES: `requirements.txt` updated — added `ragas`, `datasets`, `langchain-anthropic`, `fastapi`, `uvicorn`.
### Fixed
- ENGINE: Fixed retrieval returning zero results — `ElasticsearchStore` assumed a LangChain-managed schema incompatible with the Chonkie-generated index. Replaced with native ES client querying actual field names.
- ENGINE: Fixed context always empty — consequence of the retrieval bug above. The generation prompt received an empty `{context}` on every request and always returned the fallback string.
---
## [1.5.1] - 2026-03-18
### Added
- DOCS: Created `docs/ARCHITECTURE.md` — full technical architecture reference covering component inventory, request lifecycle, LangGraph workflow, hybrid RAG pipeline, streaming design, evaluation pipeline, infrastructure layout, session memory, observability, and security boundaries.
- DOCS: Created `docs/API_REFERENCE.md` — complete gRPC API contract documentation with method descriptions, message type tables, error handling, and `grpcurl` client examples for all three RPCs (`AskAgent`, `AskAgentStream`, `EvaluateRAG`).
- DOCS: Created `docs/RUNBOOK.md` — operational playbook with health checks, startup/shutdown procedures, tunnel management, and incident playbooks for all known failure modes.
- DOCS: Created `SECURITY.md` — security policy covering transport security, authentication, secrets management, container security, data privacy, known limitations table, and vulnerability reporting process.
- DOCS: Created `docs/AVAP_CHUNKER_CONFIG.md` — full reference for `avap_config.json`: lexer fields, all 4 block definitions with regex breakdown, all 10 statement categories with ordering rationale, all 14 semantic tags with detection patterns, a worked example showing chunks produced from real AVAP code, and a step-by-step guide for adding new constructs.
### Changed
- DOCS: Fully rewrote `README.md` project structure tree — now reflects all files accurately including `openai_proxy.py`, `entrypoint.sh`, `golden_dataset.json`, `SECURITY.md`, `docs/ARCHITECTURE.md`, `docs/API_REFERENCE.md`, `docs/RUNBOOK.md`, `docs/adr/`, `avap_chunker.py`, `avap_config.json`, `ingestion/chunks.jsonl`, and `src/config.py`.
- DOCS: Added `Knowledge Base Ingestion` section to `README.md` documenting both ingestion pipelines in full: Pipeline A (Chonkie — `elasticsearch_ingestion.py`) with flow diagram, CLI usage, and chunk field table; Pipeline B (AVAP Native — `avap_chunker.py` + `avap_ingestor.py`) with flow diagram, chunk type table, semantic tags reference, and ingestor env vars.
- DOCS: Replaced minimal `Testing & Debugging` section with complete documentation of all three gRPC methods (`AskAgent`, `AskAgentStream`, `EvaluateRAG`) including expected responses, multi-turn example, and verdict thresholds.
- DOCS: Added `HTTP Proxy` section documenting all 7 HTTP endpoints (4 OpenAI + 3 Ollama), streaming vs non-streaming routing, `session_id` extension, and proxy env vars table.
- DOCS: Fixed `API Contract (Protobuf)` section — corrected `grpc_tools.protoc` paths and added reference to `docs/API_REFERENCE.md`.
- DOCS: Fixed remaining stale reference to `scripts/generate_mbpp_avap.py` in Dataset Generation section.
- DOCS: Added Documentation Index table to `README.md` linking all documentation files.
- DOCS: Updated `CONTRIBUTING.md` — added Section 9 (Architecture Decision Records) and updated PR checklist and doc policy table.
- ENV: Added missing variable documentation to `README.md`: `ELASTICSEARCH_USER`, `ELASTICSEARCH_PASSWORD`, `ELASTICSEARCH_API_KEY`, `ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL`.
---
## [1.5.0] - 2026-03-12
### Added
- IMPLEMENTED:
- `scripts/pipelines/flows/translate_mbpp.py`: pipeline to generate synthetic dataset from mbpp dataset.
- `scripts/tasks/prompts.py`: module containing prompts for pipelines.
- `scripts/tasks/chunk.py`: module containing functions related to chunk management.
- `synthetic_datasets`: folder containing generated synthetic datasets.
- `src/config.py`: environment variables configuration file.
### Changed
- REFACTORED: `scripts/pipelines/flows/elasticsearch_ingestion.py` now uses `docs/LRM` or `docs/samples` documents instead of pre chunked files.
- RENAMED `docs/AVAP Language: Core Commands & Functional Specification` to `docs/avap_language_github_docs`.
- REMOVED: `Makefile` file.
- REMOVED: `scripts/start-tunnels.sh` script.
- DEPENDENCIES: `requirements.txt` updated with new libraries required by the new modules.
- MOVED `scripts/generate_mbap.py` into `scripts/flows/generate_mbap.py`.
## [1.4.0] - 2026-03-10
### Added
- **Dataset Generation Suite**: Added `scripts/generate_mbpp_avap.py` to automate the creation of synthetic AVAP training data.
- **MBPP-style Benchmarking**: Support for generating structured JSON datasets with code solutions and Python-based validation tests (`test_list`).
- **LRM Integration**: The generator now performs grounded synthesis using the `avap.md` Language Reference Manual.
- **Anthropic Claude 3.5 Sonnet Integration**: Orchestration logic for high-fidelity code generation via API.
### Changed
- **README.md**: Added comprehensive documentation for the Evaluation & Dataset Generation pipeline.
- **Project Structure**: Integrated `evaluation/` directory for synthetic dataset storage.
### Security
- Added explicit policy to avoid committing real Anthropic API keys, enforcing the use of environment variables.
## [1.3.0] - 2026-03-05
### Added
- IMPLEMENTED:
- `Docker/src/utils/emb_factory`: factory modules created for embedding model generation.
- `Docker/src/utils/llm_factory`: factory modules created for LLM generation.
- `Docker/src/graph.py`: workflow graph orchestration module added.
- `Docker/src/prompts.py`: centralized prompt definitions added.
- `Docker/src/state.py`: shared state management module added.
- `scripts/pipelines/flows/elasticsearch_ingestion.py`: pipeline to populate the elasticsearch vector database.
- `ingestion/docs`: folder containing all chunked AVAP documents.
### Changed
- REFACTORED: `server.py` updated to integrate the new graph/state/prompt and utils-based architecture.
- REFACTORED: `docker-compose.yaml` now uses fully parameterized environment variables instead of hardcoded service URLs and credentials.
- DEPENDENCIES: `requirements.txt` updated with new libraries required by the new modules.
## [1.2.0] - 2026-03-03
### Added
- GOVERNANCE: Introduced `CONTRIBUTING.md` as the single source of truth for all contribution standards, covering GitFlow, infrastructure policy, repository standards, environment variables, changelog, documentation, and incident reporting.
- GOVERNANCE: Added `.github/pull_request_template.md` enforcing a mandatory structured checklist on every PR — including explicit sign-off on environment variables, changelog, and documentation.
- DOCS: Added Environment Variables reference table to `README.md`. All variables must be registered here. PRs introducing undocumented variables will be rejected.
- DOCS: Updated project structure map in `README.md` to reflect new governance files.
### Changed
- PROCESS: Pull Requests that introduce new environment variables without documentation, omit required changelog entries, or skip required documentation updates are now formally non-mergeable per `CONTRIBUTING.md`.
---
## [1.1.0] - 2026-02-16
### Added
- IMPLEMENTED: Strict repository structure enforcement to separate development environment from production runtime.
- SECURITY: Added `.dockerignore` to prevent leaking sensitive source files and local configurations into the container.
### Changed
- REFACTORED: Dockerfile build logic to optimize build context and reduce image footprint.
- ARCHITECTURE: Moved application entry point to `/app` and eliminated the redundant root `/workspace` directory for enhanced security.
### Fixed
- RESOLVED: Issue where non-production files were being bundled into the Docker image, improving deployment speed and container isolation.
---
## [1.0.0] - 2026-02-09
### Added
- **System Architecture:** Implementation of the triple-layer stack (Engine, Vector DB, Observability).
- **Core Engine:** Deployment of the `brunix-assistance-engine` using **Python 3.11**, **LangChain**, and **LangGraph** for agentic workflows.
- **Communication Layer:** Established **gRPC** as the primary high-performance interface (Port 50051/50052).
- **Knowledge Base:** Integration of **Elasticsearch 8.12** (`brunix-vector-db`) for AVAP technology RAG support.
- **Observability Framework:** Deployment of **Langfuse** and **PostgreSQL** for full trace audit and cost management.
- **Security:** Initial network isolation within Docker (`avap-network`) and production-ready secret management design.