feat: editor context injection (PRD-0002) + repository governance
This commit is contained in:
parent
3ca8fc450c
commit
2fbfad41df
|
|
@ -0,0 +1,55 @@
|
||||||
|
# CODEOWNERS
|
||||||
|
#
|
||||||
|
# Ownership and review rules for the Brunix Assistance Engine repository.
|
||||||
|
#
|
||||||
|
# Teams:
|
||||||
|
# @BRUNIX-AI/engineering — Core engineering team. Owns the production
|
||||||
|
# codebase, infrastructure, gRPC contract, and all architectural decisions.
|
||||||
|
# Required reviewer on every pull request targeting `online`.
|
||||||
|
#
|
||||||
|
# @BRUNIX-AI/research — Scientific research team. Responsible for RAG
|
||||||
|
# evaluation, embedding model benchmarking, dataset generation, and
|
||||||
|
# experiment documentation. Write access to research/ and docs/product/.
|
||||||
|
# All changes to production code require review from engineering.
|
||||||
|
#
|
||||||
|
# This file is enforced by GitHub branch protection rules on `online`.
|
||||||
|
# See: Settings → Branches → online → Require review from Code Owners
|
||||||
|
|
||||||
|
# Default — every PR requires engineering approval
|
||||||
|
* @BRUNIX-AI/engineering @rafa-ruiz
|
||||||
|
|
||||||
|
# ── Production engine ────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# gRPC contract — any change requires explicit CTO sign-off
|
||||||
|
Docker/protos/brunix.proto @BRUNIX-AI/engineering @rafa-ruiz
|
||||||
|
|
||||||
|
# Core engine — graph, server, prompts, state, evaluation
|
||||||
|
Docker/src/ @BRUNIX-AI/engineering @rafa-ruiz
|
||||||
|
|
||||||
|
# ── Ingestion & knowledge base ───────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Ingestion pipelines
|
||||||
|
scripts/pipelines/ @BRUNIX-AI/engineering @rafa-ruiz
|
||||||
|
|
||||||
|
# Grammar config — any change requires a full index rebuild
|
||||||
|
scripts/pipelines/ingestion/avap_config.json @BRUNIX-AI/engineering @rafa-ruiz
|
||||||
|
|
||||||
|
# Golden dataset — any change requires a new EvaluateRAG baseline before merging
|
||||||
|
Docker/src/golden_dataset.json @BRUNIX-AI/engineering @rafa-ruiz
|
||||||
|
|
||||||
|
# ── Research ─────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
# Research folder — managed by the research team, no engineering approval needed
|
||||||
|
# for experiment documentation, benchmarks and datasets
|
||||||
|
research/ @BRUNIX-AI/research @BRUNIX-AI/engineering
|
||||||
|
|
||||||
|
# ── Governance & documentation ───────────────────────────────────────────────
|
||||||
|
|
||||||
|
# ADRs and PRDs — all decisions require CTO approval
|
||||||
|
docs/ADR/ @BRUNIX-AI/engineering @rafa-ruiz
|
||||||
|
docs/product/ @BRUNIX-AI/engineering @rafa-ruiz
|
||||||
|
|
||||||
|
# Governance documents
|
||||||
|
CONTRIBUTING.md @BRUNIX-AI/engineering @rafa-ruiz
|
||||||
|
SECURITY.md @BRUNIX-AI/engineering @rafa-ruiz
|
||||||
|
.github/ @BRUNIX-AI/engineering @rafa-ruiz
|
||||||
129
CONTRIBUTING.md
129
CONTRIBUTING.md
|
|
@ -15,7 +15,9 @@
|
||||||
7. [Changelog Policy](#7-changelog-policy)
|
7. [Changelog Policy](#7-changelog-policy)
|
||||||
8. [Documentation Policy](#8-documentation-policy)
|
8. [Documentation Policy](#8-documentation-policy)
|
||||||
9. [Architecture Decision Records (ADRs)](#9-architecture-decision-records-adrs)
|
9. [Architecture Decision Records (ADRs)](#9-architecture-decision-records-adrs)
|
||||||
10. [Incident & Blockage Reporting](#10-incident--blockage-reporting)
|
10. [Product Requirements Documents (PRDs)](#10-product-requirements-documents-prds)
|
||||||
|
11. [Research & Experiments Policy](#11-research--experiments-policy)
|
||||||
|
12. [Incident & Blockage Reporting](#12-incident--blockage-reporting)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -92,14 +94,16 @@ A PR is not ready for review unless **all applicable items** in the following ch
|
||||||
- [ ] No new environment variables were introduced
|
- [ ] No new environment variables were introduced
|
||||||
- [ ] New environment variables are documented in the `.env` reference table in `README.md`
|
- [ ] New environment variables are documented in the `.env` reference table in `README.md`
|
||||||
|
|
||||||
**Changelog** *(see [Section 6](#6-changelog-policy))*
|
**Changelog** *(see [Section 7](#7-changelog-policy))*
|
||||||
- [ ] No changelog entry required (internal refactor, comment/typo fix, zero behavioral change)
|
- [ ] No changelog entry required (internal refactor, comment/typo fix, zero behavioral change)
|
||||||
- [ ] Changelog updated with correct version bump and date
|
- [ ] Changelog updated with correct version bump and date
|
||||||
|
|
||||||
**Documentation** *(see [Section 8](#8-documentation-policy))*
|
**Documentation** *(see [Section 8](#8-documentation-policy))*
|
||||||
- [ ] No documentation update required (internal change, no impact on setup or API)
|
- [ ] No documentation update required (internal change, no impact on setup or API)
|
||||||
- [ ] `README.md` or relevant docs updated to reflect this change
|
- [ ] `README.md` or relevant docs updated to reflect this change
|
||||||
- [ ] If a significant architectural decision was made, an ADR was created in `docs/adr/`
|
- [ ] If a significant architectural decision was made, an ADR was created in `docs/ADR/`
|
||||||
|
- [ ] If a new user-facing feature was introduced, a PRD was created in `docs/product/`
|
||||||
|
- [ ] If an experiment was conducted, results were documented in `research/`
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -170,10 +174,10 @@ The `changelog` file tracks all notable changes and follows [Semantic Versioning
|
||||||
|
|
||||||
### Format
|
### Format
|
||||||
|
|
||||||
New entries go at the top of the file, above the previous version:
|
New entries go under `[Unreleased]` at the top of the file. When a PR merges, `[Unreleased]` is renamed to the new version with its date:
|
||||||
|
|
||||||
```
|
```
|
||||||
## [X.Y.Z] - YYYY-MM-DD
|
## [Unreleased]
|
||||||
|
|
||||||
### Added
|
### Added
|
||||||
- LABEL: Description of the new feature or capability.
|
- LABEL: Description of the new feature or capability.
|
||||||
|
|
@ -185,7 +189,7 @@ New entries go at the top of the file, above the previous version:
|
||||||
- LABEL: Description of the bug resolved.
|
- LABEL: Description of the bug resolved.
|
||||||
```
|
```
|
||||||
|
|
||||||
Use uppercase short labels for scanability: `API:`, `DOCKER:`, `INFRA:`, `SECURITY:`, `ENV:`, `CONFIG:`.
|
Use uppercase short labels for scanability: `ENGINE:`, `API:`, `PROTO:`, `DOCKER:`, `INFRA:`, `SECURITY:`, `ENV:`, `CONFIG:`, `DOCS:`, `FEATURE:`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -219,7 +223,9 @@ Update `README.md` (or the relevant doc file) if the PR includes any of the foll
|
||||||
| `docs/API_REFERENCE.md` | Complete gRPC API contract and examples |
|
| `docs/API_REFERENCE.md` | Complete gRPC API contract and examples |
|
||||||
| `docs/RUNBOOK.md` | Operational playbooks and incident response |
|
| `docs/RUNBOOK.md` | Operational playbooks and incident response |
|
||||||
| `docs/AVAP_CHUNKER_CONFIG.md` | `avap_config.json` reference — blocks, statements, semantic tags |
|
| `docs/AVAP_CHUNKER_CONFIG.md` | `avap_config.json` reference — blocks, statements, semantic tags |
|
||||||
| `docs/adr/` | Architecture Decision Records |
|
| `docs/ADR/` | Architecture Decision Records |
|
||||||
|
| `docs/product/` | Product Requirements Documents |
|
||||||
|
| `research/` | Experiment results, benchmarks, datasets |
|
||||||
|
|
||||||
> **PRs that change user-facing behavior or setup without updating documentation will be rejected.**
|
> **PRs that change user-facing behavior or setup without updating documentation will be rejected.**
|
||||||
|
|
||||||
|
|
@ -233,7 +239,7 @@ Architecture Decision Records document **significant technical decisions** — c
|
||||||
|
|
||||||
Write an ADR when a PR introduces or changes:
|
Write an ADR when a PR introduces or changes:
|
||||||
|
|
||||||
- A fundamental technology choice (communication protocol, storage backend, framework)
|
- A fundamental technology choice (communication protocol, storage backend, framework, model)
|
||||||
- A design pattern that other components will follow
|
- A design pattern that other components will follow
|
||||||
- A deliberate trade-off with known consequences
|
- A deliberate trade-off with known consequences
|
||||||
- A decision that future engineers might otherwise reverse without understanding the rationale
|
- A decision that future engineers might otherwise reverse without understanding the rationale
|
||||||
|
|
@ -244,10 +250,11 @@ Write an ADR when a PR introduces or changes:
|
||||||
- Bug fixes
|
- Bug fixes
|
||||||
- Dependency version bumps
|
- Dependency version bumps
|
||||||
- Configuration changes
|
- Configuration changes
|
||||||
|
- New user-facing features (use a PRD instead)
|
||||||
|
|
||||||
### ADR format
|
### ADR format
|
||||||
|
|
||||||
ADRs live in `docs/adr/` and follow this naming convention:
|
ADRs live in `docs/ADR/` and follow this naming convention:
|
||||||
|
|
||||||
```
|
```
|
||||||
ADR-XXXX-short-title.md
|
ADR-XXXX-short-title.md
|
||||||
|
|
@ -261,7 +268,7 @@ Each ADR must contain:
|
||||||
# ADR-XXXX: Title
|
# ADR-XXXX: Title
|
||||||
|
|
||||||
**Date:** YYYY-MM-DD
|
**Date:** YYYY-MM-DD
|
||||||
**Status:** Proposed | Accepted | Deprecated | Superseded by ADR-YYYY
|
**Status:** Proposed | Under Evaluation | Accepted | Deprecated | Superseded by ADR-YYYY
|
||||||
**Deciders:** Names or roles
|
**Deciders:** Names or roles
|
||||||
|
|
||||||
## Context
|
## Context
|
||||||
|
|
@ -281,14 +288,106 @@ What are the positive and negative results of this decision?
|
||||||
|
|
||||||
| ADR | Title | Status |
|
| ADR | Title | Status |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| [ADR-0001](docs/adr/ADR-0001-grpc-primary-interface.md) | gRPC as the Primary Communication Interface | Accepted |
|
| [ADR-0001](docs/ADR/ADR-0001-grpc-primary-interface.md) | gRPC as the Primary Communication Interface | Accepted |
|
||||||
| [ADR-0002](docs/adr/ADR-0002-two-phase-streaming.md) | Two-Phase Streaming Design for AskAgentStream | Accepted |
|
| [ADR-0002](docs/ADR/ADR-0002-two-phase-streaming.md) | Two-Phase Streaming Design for AskAgentStream | Accepted |
|
||||||
| [ADR-0003](docs/adr/ADR-0003-hybrid-retrieval-rrf.md) | Hybrid Retrieval (BM25 + kNN) with RRF Fusion | Accepted |
|
| [ADR-0003](docs/ADR/ADR-0003-hybrid-retrieval-rrf.md) | Hybrid Retrieval (BM25 + kNN) with RRF Fusion | Accepted |
|
||||||
| [ADR-0004](docs/adr/ADR-0004-claude-eval-judge.md) | Claude as the RAGAS Evaluation Judge | Accepted |
|
| [ADR-0004](docs/ADR/ADR-0004-claude-eval-judge.md) | Claude as the RAGAS Evaluation Judge | Accepted |
|
||||||
|
| [ADR-0005](docs/ADR/ADR-0005-embedding-model-selection.md) | Embedding Model Selection — BGE-M3 vs Qwen3-Embedding-0.6B | Under Evaluation |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 10. Incident & Blockage Reporting
|
## 10. Product Requirements Documents (PRDs)
|
||||||
|
|
||||||
|
Product Requirements Documents capture **user-facing features** — what is being built, why it is needed, and how it will be validated. Every feature that modifies the public API, the gRPC contract, or the user experience of any client (VS Code extension, OpenAI-compatible proxy, etc.) requires a PRD before implementation begins.
|
||||||
|
|
||||||
|
### When to write a PRD
|
||||||
|
|
||||||
|
Write a PRD when a PR introduces or changes:
|
||||||
|
|
||||||
|
- A new capability visible to any external consumer (extension, API client, proxy)
|
||||||
|
- A change to the gRPC contract (`brunix.proto`)
|
||||||
|
- A change to the HTTP proxy endpoints or behavior
|
||||||
|
- A feature requested by product or business stakeholders
|
||||||
|
|
||||||
|
### When NOT to write a PRD
|
||||||
|
|
||||||
|
- Internal architectural changes (use an ADR instead)
|
||||||
|
- Bug fixes with no change in user-visible behavior
|
||||||
|
- Infrastructure or tooling changes
|
||||||
|
|
||||||
|
### PRD format
|
||||||
|
|
||||||
|
PRDs live in `docs/product/` and follow this naming convention:
|
||||||
|
|
||||||
|
```
|
||||||
|
PRD-XXXX-short-title.md
|
||||||
|
```
|
||||||
|
|
||||||
|
Each PRD must contain:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# PRD-XXXX: Title
|
||||||
|
|
||||||
|
**Date:** YYYY-MM-DD
|
||||||
|
**Status:** Proposed | Implemented
|
||||||
|
**Requested by:** Name / role
|
||||||
|
**Related ADR:** ADR-XXXX (if applicable)
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
What user or business problem does this solve?
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
What are we building?
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
What is in scope and explicitly out of scope?
|
||||||
|
|
||||||
|
## Technical design
|
||||||
|
Key implementation decisions.
|
||||||
|
|
||||||
|
## Validation
|
||||||
|
How do we know this works? Acceptance criteria.
|
||||||
|
|
||||||
|
## Impact on parallel workstreams
|
||||||
|
Does this affect any ongoing experiment or evaluation?
|
||||||
|
```
|
||||||
|
|
||||||
|
### Existing PRDs
|
||||||
|
|
||||||
|
| PRD | Title | Status |
|
||||||
|
|---|---|---|
|
||||||
|
| [PRD-0001](docs/product/PRD-0001-openai-compatible-proxy.md) | OpenAI-Compatible HTTP Proxy | Implemented |
|
||||||
|
| [PRD-0002](docs/product/PRD-0002-editor-context-injection.md) | Editor Context Injection for VS Code Extension | Proposed |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Research & Experiments Policy
|
||||||
|
|
||||||
|
All scientific experiments, benchmark results, and dataset evaluations conducted by the research team must be documented and committed to the repository under `research/`.
|
||||||
|
|
||||||
|
### Rules
|
||||||
|
|
||||||
|
- Every experiment must have a corresponding result file in `research/` before any engineering decision based on that experiment is considered valid.
|
||||||
|
- Benchmark scripts, evaluation notebooks, and raw results must be committed alongside a summary README that explains the methodology, datasets used, metrics, and conclusions.
|
||||||
|
- Experiments that inform an ADR must be referenced from that ADR with a direct path to the result files.
|
||||||
|
- The golden dataset used by `EvaluateRAG` (`Docker/src/golden_dataset.json`) is a production artifact. Any modification requires explicit approval from the CTO and a new baseline EvaluateRAG run before the change is merged.
|
||||||
|
|
||||||
|
### Directory structure
|
||||||
|
|
||||||
|
```
|
||||||
|
research/
|
||||||
|
embeddings/ ← embedding model benchmarks (BEIR, MTEB)
|
||||||
|
experiments/ ← RAG architecture experiments
|
||||||
|
datasets/ ← synthetic datasets and golden datasets
|
||||||
|
```
|
||||||
|
|
||||||
|
### Why this matters
|
||||||
|
|
||||||
|
An engineering decision based on an experiment that is not reproducible, not committed, or not peer-reviewable has no scientific validity. All decisions with impact on the production system must be traceable to documented, committed evidence.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Incident & Blockage Reporting
|
||||||
|
|
||||||
If you encounter a technical blockage (connection timeouts, service downtime, tunnel failures):
|
If you encounter a technical blockage (connection timeouts, service downtime, tunnel failures):
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -18,8 +18,28 @@ service AssistanceEngine {
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
message AgentRequest {
|
message AgentRequest {
|
||||||
|
// ── Core fields (v1) ──────────────────────────────────────────────────────
|
||||||
string query = 1;
|
string query = 1;
|
||||||
string session_id = 2;
|
string session_id = 2;
|
||||||
|
|
||||||
|
// ── Editor context fields (v2 — PRD-0002) ────────────────────────────────
|
||||||
|
// All three fields are optional. Clients that do not send them default to
|
||||||
|
// empty string. Existing clients remain fully compatible without changes.
|
||||||
|
|
||||||
|
// Full content of the active file open in the editor at query time.
|
||||||
|
// Gives the assistant awareness of the complete code the user is working on.
|
||||||
|
string editor_content = 3;
|
||||||
|
|
||||||
|
// Text currently selected in the editor, if any.
|
||||||
|
// Most precise signal of user intent — if non-empty, the question almost
|
||||||
|
// certainly refers to this specific code block.
|
||||||
|
string selected_text = 4;
|
||||||
|
|
||||||
|
// Free-form additional context (e.g. file path, language identifier,
|
||||||
|
// open diagnostic errors). Extensible without requiring future proto changes.
|
||||||
|
string extra_context = 5;
|
||||||
|
|
||||||
|
string user_info = 6;
|
||||||
}
|
}
|
||||||
|
|
||||||
message AgentResponse {
|
message AgentResponse {
|
||||||
|
|
|
||||||
|
|
@ -20,6 +20,7 @@ logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
session_store: dict[str, list] = defaultdict(list)
|
session_store: dict[str, list] = defaultdict(list)
|
||||||
|
|
||||||
|
|
||||||
def format_context(docs):
|
def format_context(docs):
|
||||||
chunks = []
|
chunks = []
|
||||||
for i, doc in enumerate(docs, 1):
|
for i, doc in enumerate(docs, 1):
|
||||||
|
|
@ -142,6 +143,89 @@ def hybrid_search_native(es_client, embeddings, query, index_name, k=8):
|
||||||
logger.info(f"[hybrid] RRF -> {len(docs)} final docs")
|
logger.info(f"[hybrid] RRF -> {len(docs)} final docs")
|
||||||
return docs
|
return docs
|
||||||
|
|
||||||
|
def _build_classify_prompt(question: str, history_text: str, selected_text: str) -> str:
|
||||||
|
prompt = (
|
||||||
|
CLASSIFY_PROMPT_TEMPLATE
|
||||||
|
.replace("{history}", history_text)
|
||||||
|
.replace("{message}", question)
|
||||||
|
)
|
||||||
|
if selected_text:
|
||||||
|
editor_section = (
|
||||||
|
"\n\n<editor_selection>\n"
|
||||||
|
"The user currently has the following AVAP code selected in their editor. "
|
||||||
|
"If the question refers to 'this', 'here', 'the code above', or similar, "
|
||||||
|
"it is about this selection.\n"
|
||||||
|
f"{selected_text}\n"
|
||||||
|
"</editor_selection>"
|
||||||
|
)
|
||||||
|
|
||||||
|
prompt = prompt.replace(
|
||||||
|
f"<user_message>{question}</user_message>",
|
||||||
|
f"{editor_section}\n\n<user_message>{question}</user_message>"
|
||||||
|
)
|
||||||
|
return prompt
|
||||||
|
|
||||||
|
|
||||||
|
def _build_reformulate_query(question: str, selected_text: str) -> str:
|
||||||
|
if not selected_text:
|
||||||
|
return question
|
||||||
|
return f"{selected_text}\n\nUser question about the above: {question}"
|
||||||
|
|
||||||
|
|
||||||
|
def _build_generation_prompt(template_prompt: SystemMessage, context: str,
|
||||||
|
editor_content: str, selected_text: str,
|
||||||
|
extra_context: str) -> SystemMessage:
|
||||||
|
base = template_prompt.content.format(context=context)
|
||||||
|
|
||||||
|
sections = []
|
||||||
|
|
||||||
|
if selected_text:
|
||||||
|
sections.append(
|
||||||
|
"<selected_code>\n"
|
||||||
|
"The user has the following AVAP code selected in their editor. "
|
||||||
|
"Ground your answer in this code first. "
|
||||||
|
"Use the RAG context as supplementary reference only.\n"
|
||||||
|
f"{selected_text}\n"
|
||||||
|
"</selected_code>"
|
||||||
|
)
|
||||||
|
|
||||||
|
if editor_content:
|
||||||
|
sections.append(
|
||||||
|
"<editor_file>\n"
|
||||||
|
"Full content of the active file open in the editor "
|
||||||
|
"(use for broader context if needed):\n"
|
||||||
|
f"{editor_content}\n"
|
||||||
|
"</editor_file>"
|
||||||
|
)
|
||||||
|
|
||||||
|
if extra_context:
|
||||||
|
sections.append(
|
||||||
|
"<extra_context>\n"
|
||||||
|
f"{extra_context}\n"
|
||||||
|
"</extra_context>"
|
||||||
|
)
|
||||||
|
|
||||||
|
if sections:
|
||||||
|
editor_block = "\n\n".join(sections)
|
||||||
|
base = editor_block + "\n\n" + base
|
||||||
|
|
||||||
|
return SystemMessage(content=base)
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_query_type(raw: str) -> tuple[str, bool]:
|
||||||
|
parts = raw.strip().upper().split()
|
||||||
|
query_type = "RETRIEVAL"
|
||||||
|
use_editor = False
|
||||||
|
if parts:
|
||||||
|
first = parts[0]
|
||||||
|
if first.startswith("CODE_GENERATION") or "CODE" in first:
|
||||||
|
query_type = "CODE_GENERATION"
|
||||||
|
elif first.startswith("CONVERSATIONAL"):
|
||||||
|
query_type = "CONVERSATIONAL"
|
||||||
|
if len(parts) > 1 and parts[1] == "EDITOR":
|
||||||
|
use_editor = True
|
||||||
|
return query_type, use_editor
|
||||||
|
|
||||||
def build_graph(llm, embeddings, es_client, index_name):
|
def build_graph(llm, embeddings, es_client, index_name):
|
||||||
|
|
||||||
def _persist(state: AgentState, response: BaseMessage):
|
def _persist(state: AgentState, response: BaseMessage):
|
||||||
|
|
@ -156,43 +240,37 @@ def build_graph(llm, embeddings, es_client, index_name):
|
||||||
user_msg.get("content", "")
|
user_msg.get("content", "")
|
||||||
if isinstance(user_msg, dict) else "")
|
if isinstance(user_msg, dict) else "")
|
||||||
history_msgs = messages[:-1]
|
history_msgs = messages[:-1]
|
||||||
|
selected_text = state.get("selected_text", "")
|
||||||
|
|
||||||
if not history_msgs:
|
history_text = format_history_for_classify(history_msgs) if history_msgs else "(no history)"
|
||||||
prompt_content = (
|
prompt_content = _build_classify_prompt(question, history_text, selected_text)
|
||||||
CLASSIFY_PROMPT_TEMPLATE
|
|
||||||
.replace("{history}", "(no history)")
|
|
||||||
.replace("{message}", question)
|
|
||||||
)
|
|
||||||
resp = llm.invoke([SystemMessage(content=prompt_content)])
|
|
||||||
raw = resp.content.strip().upper()
|
|
||||||
query_type = _parse_query_type(raw)
|
|
||||||
logger.info(f"[classify] no historic content raw='{raw}' -> {query_type}")
|
|
||||||
return {"query_type": query_type}
|
|
||||||
|
|
||||||
history_text = format_history_for_classify(history_msgs)
|
|
||||||
prompt_content = (
|
|
||||||
CLASSIFY_PROMPT_TEMPLATE
|
|
||||||
.replace("{history}", history_text)
|
|
||||||
.replace("{message}", question)
|
|
||||||
)
|
|
||||||
resp = llm.invoke([SystemMessage(content=prompt_content)])
|
resp = llm.invoke([SystemMessage(content=prompt_content)])
|
||||||
raw = resp.content.strip().upper()
|
raw = resp.content.strip().upper()
|
||||||
query_type = _parse_query_type(raw)
|
query_type, use_editor_ctx = _parse_query_type(raw)
|
||||||
logger.info(f"[classify] raw='{raw}' -> {query_type}")
|
logger.info(f"[classify] selected={bool(selected_text)} raw='{raw}' -> {query_type} editor={use_editor_ctx}")
|
||||||
return {"query_type": query_type}
|
return {"query_type": query_type, "use_editor_context": use_editor_ctx}
|
||||||
|
|
||||||
def _parse_query_type(raw: str) -> str:
|
|
||||||
if raw.startswith("CODE_GENERATION") or "CODE" in raw:
|
|
||||||
return "CODE_GENERATION"
|
|
||||||
if raw.startswith("CONVERSATIONAL"):
|
|
||||||
return "CONVERSATIONAL"
|
|
||||||
return "RETRIEVAL"
|
|
||||||
|
|
||||||
def reformulate(state: AgentState) -> AgentState:
|
def reformulate(state: AgentState) -> AgentState:
|
||||||
user_msg = state["messages"][-1]
|
user_msg = state["messages"][-1]
|
||||||
resp = llm.invoke([REFORMULATE_PROMPT, user_msg])
|
selected_text = state.get("selected_text", "")
|
||||||
|
question = getattr(user_msg, "content",
|
||||||
|
user_msg.get("content", "")
|
||||||
|
if isinstance(user_msg, dict) else "")
|
||||||
|
|
||||||
|
anchor = _build_reformulate_query(question, selected_text)
|
||||||
|
|
||||||
|
if selected_text:
|
||||||
|
|
||||||
|
from langchain_core.messages import HumanMessage as HM
|
||||||
|
resp = llm.invoke([REFORMULATE_PROMPT, HM(content=anchor)])
|
||||||
|
else:
|
||||||
|
query_type = state.get("query_type", "RETRIEVAL")
|
||||||
|
mode_hint = HumanMessage(content=f"[MODE: {query_type}]\n{question}")
|
||||||
|
resp = llm.invoke([REFORMULATE_PROMPT, mode_hint])
|
||||||
|
|
||||||
reformulated = resp.content.strip()
|
reformulated = resp.content.strip()
|
||||||
logger.info(f"[reformulate] -> '{reformulated}'")
|
logger.info(f"[reformulate] selected={bool(selected_text)} -> '{reformulated}'")
|
||||||
return {"reformulated_query": reformulated}
|
return {"reformulated_query": reformulated}
|
||||||
|
|
||||||
def retrieve(state: AgentState) -> AgentState:
|
def retrieve(state: AgentState) -> AgentState:
|
||||||
|
|
@ -209,8 +287,13 @@ def build_graph(llm, embeddings, es_client, index_name):
|
||||||
return {"context": context}
|
return {"context": context}
|
||||||
|
|
||||||
def generate(state):
|
def generate(state):
|
||||||
prompt = SystemMessage(
|
use_editor = state.get("use_editor_context", False)
|
||||||
content=GENERATE_PROMPT.content.format(context=state["context"])
|
prompt = _build_generation_prompt(
|
||||||
|
template_prompt = GENERATE_PROMPT,
|
||||||
|
context = state.get("context", ""),
|
||||||
|
editor_content = state.get("editor_content", "") if use_editor else "",
|
||||||
|
selected_text = state.get("selected_text", "") if use_editor else "",
|
||||||
|
extra_context = state.get("extra_context", ""),
|
||||||
)
|
)
|
||||||
resp = llm.invoke([prompt] + state["messages"])
|
resp = llm.invoke([prompt] + state["messages"])
|
||||||
logger.info(f"[generate] {len(resp.content)} chars")
|
logger.info(f"[generate] {len(resp.content)} chars")
|
||||||
|
|
@ -218,8 +301,13 @@ def build_graph(llm, embeddings, es_client, index_name):
|
||||||
return {"messages": [resp]}
|
return {"messages": [resp]}
|
||||||
|
|
||||||
def generate_code(state):
|
def generate_code(state):
|
||||||
prompt = SystemMessage(
|
use_editor = state.get("use_editor_context", False)
|
||||||
content=CODE_GENERATION_PROMPT.content.format(context=state["context"])
|
prompt = _build_generation_prompt(
|
||||||
|
template_prompt = CODE_GENERATION_PROMPT,
|
||||||
|
context = state.get("context", ""),
|
||||||
|
editor_content = state.get("editor_content", "") if use_editor else "",
|
||||||
|
selected_text = state.get("selected_text", "") if use_editor else "",
|
||||||
|
extra_context = state.get("extra_context", ""),
|
||||||
)
|
)
|
||||||
resp = llm.invoke([prompt] + state["messages"])
|
resp = llm.invoke([prompt] + state["messages"])
|
||||||
logger.info(f"[generate_code] {len(resp.content)} chars")
|
logger.info(f"[generate_code] {len(resp.content)} chars")
|
||||||
|
|
@ -228,7 +316,7 @@ def build_graph(llm, embeddings, es_client, index_name):
|
||||||
|
|
||||||
def respond_conversational(state):
|
def respond_conversational(state):
|
||||||
resp = llm.invoke([CONVERSATIONAL_PROMPT] + state["messages"])
|
resp = llm.invoke([CONVERSATIONAL_PROMPT] + state["messages"])
|
||||||
logger.info("[conversational] from comversation")
|
logger.info("[conversational] from conversation")
|
||||||
_persist(state, resp)
|
_persist(state, resp)
|
||||||
return {"messages": [resp]}
|
return {"messages": [resp]}
|
||||||
|
|
||||||
|
|
@ -254,9 +342,9 @@ def build_graph(llm, embeddings, es_client, index_name):
|
||||||
"classify",
|
"classify",
|
||||||
route_by_type,
|
route_by_type,
|
||||||
{
|
{
|
||||||
"RETRIEVAL": "reformulate",
|
"RETRIEVAL": "reformulate",
|
||||||
"CODE_GENERATION": "reformulate",
|
"CODE_GENERATION": "reformulate",
|
||||||
"CONVERSATIONAL": "respond_conversational",
|
"CONVERSATIONAL": "respond_conversational",
|
||||||
}
|
}
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -266,7 +354,7 @@ def build_graph(llm, embeddings, es_client, index_name):
|
||||||
"retrieve",
|
"retrieve",
|
||||||
route_after_retrieve,
|
route_after_retrieve,
|
||||||
{
|
{
|
||||||
"generate": "generate",
|
"generate": "generate",
|
||||||
"generate_code": "generate_code",
|
"generate_code": "generate_code",
|
||||||
}
|
}
|
||||||
)
|
)
|
||||||
|
|
@ -284,46 +372,39 @@ def build_prepare_graph(llm, embeddings, es_client, index_name):
|
||||||
messages = state["messages"]
|
messages = state["messages"]
|
||||||
user_msg = messages[-1]
|
user_msg = messages[-1]
|
||||||
question = getattr(user_msg, "content",
|
question = getattr(user_msg, "content",
|
||||||
user_msg.get("content", "")
|
user_msg.get("content", "")
|
||||||
if isinstance(user_msg, dict) else "")
|
if isinstance(user_msg, dict) else "")
|
||||||
history_msgs = messages[:-1]
|
history_msgs = messages[:-1]
|
||||||
|
selected_text = state.get("selected_text", "")
|
||||||
|
|
||||||
if not history_msgs:
|
history_text = format_history_for_classify(history_msgs) if history_msgs else "(no history)"
|
||||||
prompt_content = (
|
prompt_content = _build_classify_prompt(question, history_text, selected_text)
|
||||||
CLASSIFY_PROMPT_TEMPLATE
|
|
||||||
.replace("{history}", "(no history)")
|
|
||||||
.replace("{message}", question)
|
|
||||||
)
|
|
||||||
resp = llm.invoke([SystemMessage(content=prompt_content)])
|
|
||||||
raw = resp.content.strip().upper()
|
|
||||||
query_type = _parse_query_type(raw)
|
|
||||||
logger.info(f"[prepare/classify] no history raw='{raw}' -> {query_type}")
|
|
||||||
return {"query_type": query_type}
|
|
||||||
|
|
||||||
history_text = format_history_for_classify(history_msgs)
|
|
||||||
prompt_content = (
|
|
||||||
CLASSIFY_PROMPT_TEMPLATE
|
|
||||||
.replace("{history}", history_text)
|
|
||||||
.replace("{message}", question)
|
|
||||||
)
|
|
||||||
resp = llm.invoke([SystemMessage(content=prompt_content)])
|
resp = llm.invoke([SystemMessage(content=prompt_content)])
|
||||||
raw = resp.content.strip().upper()
|
raw = resp.content.strip().upper()
|
||||||
query_type = _parse_query_type(raw)
|
query_type, use_editor_ctx = _parse_query_type(raw)
|
||||||
logger.info(f"[prepare/classify] raw='{raw}' -> {query_type}")
|
logger.info(f"[prepare/classify] selected={bool(selected_text)} raw='{raw}' -> {query_type} editor={use_editor_ctx}")
|
||||||
return {"query_type": query_type}
|
return {"query_type": query_type, "use_editor_context": use_editor_ctx}
|
||||||
|
|
||||||
def _parse_query_type(raw: str) -> str:
|
|
||||||
if raw.startswith("CODE_GENERATION") or "CODE" in raw:
|
|
||||||
return "CODE_GENERATION"
|
|
||||||
if raw.startswith("CONVERSATIONAL"):
|
|
||||||
return "CONVERSATIONAL"
|
|
||||||
return "RETRIEVAL"
|
|
||||||
|
|
||||||
def reformulate(state: AgentState) -> AgentState:
|
def reformulate(state: AgentState) -> AgentState:
|
||||||
user_msg = state["messages"][-1]
|
user_msg = state["messages"][-1]
|
||||||
resp = llm.invoke([REFORMULATE_PROMPT, user_msg])
|
selected_text = state.get("selected_text", "")
|
||||||
|
question = getattr(user_msg, "content",
|
||||||
|
user_msg.get("content", "")
|
||||||
|
if isinstance(user_msg, dict) else "")
|
||||||
|
|
||||||
|
anchor = _build_reformulate_query(question, selected_text)
|
||||||
|
|
||||||
|
if selected_text:
|
||||||
|
from langchain_core.messages import HumanMessage as HM
|
||||||
|
resp = llm.invoke([REFORMULATE_PROMPT, HM(content=anchor)])
|
||||||
|
else:
|
||||||
|
query_type = state.get("query_type", "RETRIEVAL")
|
||||||
|
mode_hint = HumanMessage(content=f"[MODE: {query_type}]\n{question}")
|
||||||
|
resp = llm.invoke([REFORMULATE_PROMPT, mode_hint])
|
||||||
|
|
||||||
reformulated = resp.content.strip()
|
reformulated = resp.content.strip()
|
||||||
logger.info(f"[prepare/reformulate] -> '{reformulated}'")
|
logger.info(f"[prepare/reformulate] selected={bool(selected_text)} -> '{reformulated}'")
|
||||||
return {"reformulated_query": reformulated}
|
return {"reformulated_query": reformulated}
|
||||||
|
|
||||||
def retrieve(state: AgentState) -> AgentState:
|
def retrieve(state: AgentState) -> AgentState:
|
||||||
|
|
@ -366,7 +447,7 @@ def build_prepare_graph(llm, embeddings, es_client, index_name):
|
||||||
|
|
||||||
graph_builder.add_edge("reformulate", "retrieve")
|
graph_builder.add_edge("reformulate", "retrieve")
|
||||||
graph_builder.add_edge("retrieve", END)
|
graph_builder.add_edge("retrieve", END)
|
||||||
graph_builder.add_edge("skip_retrieve",END)
|
graph_builder.add_edge("skip_retrieve", END)
|
||||||
|
|
||||||
return graph_builder.compile()
|
return graph_builder.compile()
|
||||||
|
|
||||||
|
|
@ -375,17 +456,29 @@ def build_final_messages(state: AgentState) -> list:
|
||||||
query_type = state.get("query_type", "RETRIEVAL")
|
query_type = state.get("query_type", "RETRIEVAL")
|
||||||
context = state.get("context", "")
|
context = state.get("context", "")
|
||||||
messages = state.get("messages", [])
|
messages = state.get("messages", [])
|
||||||
|
editor_content = state.get("editor_content", "")
|
||||||
|
selected_text = state.get("selected_text", "")
|
||||||
|
extra_context = state.get("extra_context", "")
|
||||||
|
|
||||||
if query_type == "CONVERSATIONAL":
|
if query_type == "CONVERSATIONAL":
|
||||||
return [CONVERSATIONAL_PROMPT] + messages
|
return [CONVERSATIONAL_PROMPT] + messages
|
||||||
|
|
||||||
|
use_editor = state.get("use_editor_context", False)
|
||||||
if query_type == "CODE_GENERATION":
|
if query_type == "CODE_GENERATION":
|
||||||
prompt = SystemMessage(
|
prompt = _build_generation_prompt(
|
||||||
content=CODE_GENERATION_PROMPT.content.format(context=context)
|
template_prompt = CODE_GENERATION_PROMPT,
|
||||||
|
context = context,
|
||||||
|
editor_content = editor_content if use_editor else "",
|
||||||
|
selected_text = selected_text if use_editor else "",
|
||||||
|
extra_context = extra_context,
|
||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
prompt = SystemMessage(
|
prompt = _build_generation_prompt(
|
||||||
content=GENERATE_PROMPT.content.format(context=context)
|
template_prompt = GENERATE_PROMPT,
|
||||||
|
context = context,
|
||||||
|
editor_content = editor_content if use_editor else "",
|
||||||
|
selected_text= selected_text if use_editor else "",
|
||||||
|
extra_context = extra_context,
|
||||||
)
|
)
|
||||||
|
|
||||||
return [prompt] + messages
|
return [prompt] + messages
|
||||||
|
|
@ -154,13 +154,42 @@ def _query_from_messages(messages: list[ChatMessage]) -> str:
|
||||||
return ""
|
return ""
|
||||||
|
|
||||||
|
|
||||||
async def _invoke_blocking(query: str, session_id: str) -> str:
|
async def _invoke_blocking(query: str, session_id: str, context = {}) -> str:
|
||||||
|
|
||||||
loop = asyncio.get_event_loop()
|
loop = asyncio.get_event_loop()
|
||||||
|
|
||||||
def _call():
|
def _call():
|
||||||
stub = get_stub()
|
stub = get_stub()
|
||||||
req = brunix_pb2.AgentRequest(query=query, session_id=session_id)
|
|
||||||
|
|
||||||
|
try:
|
||||||
|
ed_contxt = context["editor_content"] or ""
|
||||||
|
except Exception:
|
||||||
|
ed_contxt = ""
|
||||||
|
|
||||||
|
try:
|
||||||
|
sel_contxt = context["selected_text"] or ""
|
||||||
|
except Exception:
|
||||||
|
sel_contxt = ""
|
||||||
|
|
||||||
|
try:
|
||||||
|
ext_contxt = context["extra_context"] or ""
|
||||||
|
except Exception:
|
||||||
|
ext_contxt = ""
|
||||||
|
|
||||||
|
try:
|
||||||
|
us_info = str(context["user_info"]) or "{}"
|
||||||
|
except Exception:
|
||||||
|
us_info = "{}"
|
||||||
|
|
||||||
|
|
||||||
|
req = brunix_pb2.AgentRequest(query=query, session_id=session_id,
|
||||||
|
editor_content=ed_contxt,
|
||||||
|
selected_text=sel_contxt,
|
||||||
|
extra_context=ext_contxt,
|
||||||
|
user_info=us_info)
|
||||||
|
|
||||||
|
|
||||||
parts = []
|
parts = []
|
||||||
for resp in stub.AskAgent(req):
|
for resp in stub.AskAgent(req):
|
||||||
if resp.text:
|
if resp.text:
|
||||||
|
|
@ -170,7 +199,7 @@ async def _invoke_blocking(query: str, session_id: str) -> str:
|
||||||
return await loop.run_in_executor(_thread_pool, _call)
|
return await loop.run_in_executor(_thread_pool, _call)
|
||||||
|
|
||||||
|
|
||||||
async def _iter_stream(query: str, session_id: str) -> AsyncIterator[brunix_pb2.AgentResponse]:
|
async def _iter_stream(query: str, session_id: str, context = {}) -> AsyncIterator[brunix_pb2.AgentResponse]:
|
||||||
|
|
||||||
loop = asyncio.get_event_loop()
|
loop = asyncio.get_event_loop()
|
||||||
queue: asyncio.Queue = asyncio.Queue()
|
queue: asyncio.Queue = asyncio.Queue()
|
||||||
|
|
@ -178,8 +207,38 @@ async def _iter_stream(query: str, session_id: str) -> AsyncIterator[brunix_pb2.
|
||||||
def _producer():
|
def _producer():
|
||||||
try:
|
try:
|
||||||
stub = get_stub()
|
stub = get_stub()
|
||||||
req = brunix_pb2.AgentRequest(query=query, session_id=session_id)
|
print("CONTEXT ====")
|
||||||
for resp in stub.AskAgentStream(req): # ← AskAgentStream
|
print(context)
|
||||||
|
print("======= ====")
|
||||||
|
|
||||||
|
try:
|
||||||
|
ed_contxt = context["editor_content"] or ""
|
||||||
|
except Exception:
|
||||||
|
ed_contxt = ""
|
||||||
|
|
||||||
|
try:
|
||||||
|
sel_contxt = context["selected_text"] or ""
|
||||||
|
except Exception:
|
||||||
|
sel_contxt = ""
|
||||||
|
|
||||||
|
try:
|
||||||
|
ext_contxt = context["extra_context"] or ""
|
||||||
|
except Exception:
|
||||||
|
ext_contxt = ""
|
||||||
|
|
||||||
|
try:
|
||||||
|
us_info = str(context["user_info"]) or "{}"
|
||||||
|
except Exception:
|
||||||
|
us_info = "{}"
|
||||||
|
|
||||||
|
|
||||||
|
req = brunix_pb2.AgentRequest(query=query, session_id=session_id,
|
||||||
|
editor_content=ed_contxt,
|
||||||
|
selected_text=sel_contxt,
|
||||||
|
extra_context=ext_contxt,
|
||||||
|
user_info=us_info)
|
||||||
|
|
||||||
|
for resp in stub.AskAgentStream(req): # AskAgentStream
|
||||||
asyncio.run_coroutine_threadsafe(queue.put(resp), loop).result()
|
asyncio.run_coroutine_threadsafe(queue.put(resp), loop).result()
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
asyncio.run_coroutine_threadsafe(queue.put(e), loop).result()
|
asyncio.run_coroutine_threadsafe(queue.put(e), loop).result()
|
||||||
|
|
@ -197,16 +256,17 @@ async def _iter_stream(query: str, session_id: str) -> AsyncIterator[brunix_pb2.
|
||||||
yield item
|
yield item
|
||||||
|
|
||||||
|
|
||||||
async def _stream_chat(query: str, session_id: str, req_id: str) -> AsyncIterator[str]:
|
async def _stream_chat(query: str, session_id: str, req_id: str, context = {}) -> AsyncIterator[str]:
|
||||||
|
|
||||||
try:
|
try:
|
||||||
async for resp in _iter_stream(query, session_id):
|
async for resp in _iter_stream(query, session_id, context):
|
||||||
if resp.is_final:
|
if resp.is_final:
|
||||||
yield _sse(_chat_chunk("", req_id, finish="stop"))
|
yield _sse(_chat_chunk("", req_id, finish="stop"))
|
||||||
break
|
break
|
||||||
if resp.text:
|
if resp.text:
|
||||||
yield _sse(_chat_chunk(resp.text, req_id))
|
yield _sse(_chat_chunk(resp.text, req_id))
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"[stream_chat] error: {e}")
|
logger.error(f"[stream_chat] error: {e}", exc_info=True)
|
||||||
yield _sse(_chat_chunk(f"[Error: {e}]", req_id, finish="stop"))
|
yield _sse(_chat_chunk(f"[Error: {e}]", req_id, finish="stop"))
|
||||||
|
|
||||||
yield _sse_done()
|
yield _sse_done()
|
||||||
|
|
@ -285,9 +345,18 @@ async def list_models():
|
||||||
@app.post("/v1/chat/completions")
|
@app.post("/v1/chat/completions")
|
||||||
async def chat_completions(req: ChatCompletionRequest):
|
async def chat_completions(req: ChatCompletionRequest):
|
||||||
query = _query_from_messages(req.messages)
|
query = _query_from_messages(req.messages)
|
||||||
session_id = req.session_id or req.user or "default"
|
|
||||||
|
session_id = req.session_id or "default"
|
||||||
req_id = f"chatcmpl-{uuid.uuid4().hex}"
|
req_id = f"chatcmpl-{uuid.uuid4().hex}"
|
||||||
|
|
||||||
|
context = {}
|
||||||
|
|
||||||
|
try:
|
||||||
|
context = json.loads(req.user)
|
||||||
|
except Exception as e:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
logger.info(f"[chat] session={session_id} stream={req.stream} query='{query[:80]}'")
|
logger.info(f"[chat] session={session_id} stream={req.stream} query='{query[:80]}'")
|
||||||
|
|
||||||
if not query:
|
if not query:
|
||||||
|
|
@ -296,7 +365,7 @@ async def chat_completions(req: ChatCompletionRequest):
|
||||||
if req.stream:
|
if req.stream:
|
||||||
|
|
||||||
return StreamingResponse(
|
return StreamingResponse(
|
||||||
_stream_chat(query, session_id, req_id),
|
_stream_chat(query, session_id, req_id, context),
|
||||||
media_type="text/event-stream",
|
media_type="text/event-stream",
|
||||||
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
|
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -4,7 +4,8 @@ from langchain_core.messages import SystemMessage
|
||||||
CLASSIFY_PROMPT_TEMPLATE = (
|
CLASSIFY_PROMPT_TEMPLATE = (
|
||||||
"<role>\n"
|
"<role>\n"
|
||||||
"You are a query classifier for an AVAP language assistant. "
|
"You are a query classifier for an AVAP language assistant. "
|
||||||
"Your only job is to classify the user message into one of three categories.\n"
|
"Your only job is to classify the user message into one of three categories "
|
||||||
|
"and determine whether the user is explicitly asking about the editor code.\n"
|
||||||
"</role>\n\n"
|
"</role>\n\n"
|
||||||
|
|
||||||
"<categories>\n"
|
"<categories>\n"
|
||||||
|
|
@ -28,9 +29,27 @@ CLASSIFY_PROMPT_TEMPLATE = (
|
||||||
"'describe it in your own words', 'what did you mean?'\n"
|
"'describe it in your own words', 'what did you mean?'\n"
|
||||||
"</categories>\n\n"
|
"</categories>\n\n"
|
||||||
|
|
||||||
|
"<editor_rule>\n"
|
||||||
|
"The second word of your response indicates whether the user is explicitly "
|
||||||
|
"asking about the code in their editor or selected text.\n"
|
||||||
|
"Answer EDITOR only if the user message clearly refers to specific code "
|
||||||
|
"they are looking at — using expressions like: "
|
||||||
|
"'this code', 'este codigo', 'esto', 'this function', 'fix this', "
|
||||||
|
"'explain this', 'what does this do', 'que hace esto', "
|
||||||
|
"'como mejoro esto', 'el codigo del editor', 'lo que tengo aqui', "
|
||||||
|
"'this selection', 'lo seleccionado', or similar.\n"
|
||||||
|
"Answer NO_EDITOR in all other cases — including general AVAP questions, "
|
||||||
|
"code generation requests, and conversational follow-ups that do not "
|
||||||
|
"refer to specific editor code.\n"
|
||||||
|
"</editor_rule>\n\n"
|
||||||
|
|
||||||
"<output_rule>\n"
|
"<output_rule>\n"
|
||||||
"Your entire response must be exactly one word: "
|
"Your entire response must be exactly two words separated by a single space.\n"
|
||||||
"RETRIEVAL, CODE_GENERATION, or CONVERSATIONAL. Nothing else.\n"
|
"First word: RETRIEVAL, CODE_GENERATION, or CONVERSATIONAL.\n"
|
||||||
|
"Second word: EDITOR or NO_EDITOR.\n"
|
||||||
|
"Valid examples: 'RETRIEVAL NO_EDITOR', 'CODE_GENERATION EDITOR', "
|
||||||
|
"'CONVERSATIONAL NO_EDITOR'.\n"
|
||||||
|
"No other output. No punctuation. No explanation.\n"
|
||||||
"</output_rule>\n\n"
|
"</output_rule>\n\n"
|
||||||
|
|
||||||
"<conversation_history>\n"
|
"<conversation_history>\n"
|
||||||
|
|
@ -49,10 +68,23 @@ REFORMULATE_PROMPT = SystemMessage(
|
||||||
"into keyword queries that will find the right AVAP documentation chunks.\n"
|
"into keyword queries that will find the right AVAP documentation chunks.\n"
|
||||||
"</role>\n\n"
|
"</role>\n\n"
|
||||||
|
|
||||||
|
"<mode_rule>\n"
|
||||||
|
"The input starts with [MODE: X]. Follow these rules strictly:\n"
|
||||||
|
"- MODE RETRIEVAL: rewrite as compact keywords. DO NOT expand with AVAP commands. "
|
||||||
|
"DO NOT translate — preserve the original language.\n"
|
||||||
|
"- MODE CODE_GENERATION: apply the command expansion mapping in <task>.\n"
|
||||||
|
"- MODE CONVERSATIONAL: return the question as-is.\n"
|
||||||
|
"</mode_rule>\n\n"
|
||||||
|
|
||||||
|
"<language_rule>\n"
|
||||||
|
"NEVER translate the query. If the user writes in Spanish, rewrite in Spanish. "
|
||||||
|
"If the user writes in English, rewrite in English.\n"
|
||||||
|
"</language_rule>\n\n"
|
||||||
|
|
||||||
"<task>\n"
|
"<task>\n"
|
||||||
"Rewrite the user message into a compact keyword query for semantic search.\n\n"
|
"Rewrite the user message into a compact keyword query for semantic search.\n\n"
|
||||||
|
|
||||||
"SPECIAL RULE for code generation requests:\n"
|
"SPECIAL RULE for CODE_GENERATION only:\n"
|
||||||
"When the user asks to generate/create/build/show AVAP code, expand the query "
|
"When the user asks to generate/create/build/show AVAP code, expand the query "
|
||||||
"with the AVAP commands typically needed. Use this mapping:\n\n"
|
"with the AVAP commands typically needed. Use this mapping:\n\n"
|
||||||
|
|
||||||
|
|
@ -80,21 +112,27 @@ REFORMULATE_PROMPT = SystemMessage(
|
||||||
"- Remove filler words.\n"
|
"- Remove filler words.\n"
|
||||||
"- Output a single line.\n"
|
"- Output a single line.\n"
|
||||||
"- Never answer the question.\n"
|
"- Never answer the question.\n"
|
||||||
|
"- Never translate.\n"
|
||||||
"</rules>\n\n"
|
"</rules>\n\n"
|
||||||
|
|
||||||
"<examples>\n"
|
"<examples>\n"
|
||||||
"<example>\n"
|
"<example>\n"
|
||||||
"<input>What does AVAP stand for?</input>\n"
|
"<input>[MODE: RETRIEVAL] Que significa AVAP?</input>\n"
|
||||||
"<o>AVAP stand for</o>\n"
|
"<o>AVAP significado definición lenguaje DSL</o>\n"
|
||||||
"</example>\n\n"
|
"</example>\n\n"
|
||||||
|
|
||||||
"<example>\n"
|
"<example>\n"
|
||||||
"<input>dime como seria un API que devuelva hello world con AVAP</input>\n"
|
"<input>[MODE: RETRIEVAL] What does AVAP stand for?</input>\n"
|
||||||
|
"<o>AVAP definition language stands for</o>\n"
|
||||||
|
"</example>\n\n"
|
||||||
|
|
||||||
|
"<example>\n"
|
||||||
|
"<input>[MODE: CODE_GENERATION] dime como seria un API que devuelva hello world con AVAP</input>\n"
|
||||||
"<o>AVAP registerEndpoint addResult _status hello world example</o>\n"
|
"<o>AVAP registerEndpoint addResult _status hello world example</o>\n"
|
||||||
"</example>\n\n"
|
"</example>\n\n"
|
||||||
|
|
||||||
"<example>\n"
|
"<example>\n"
|
||||||
"<input>generate an AVAP script that reads a parameter and queries the DB</input>\n"
|
"<input>[MODE: CODE_GENERATION] generate an AVAP script that reads a parameter and queries the DB</input>\n"
|
||||||
"<o>AVAP addParam ormAccessSelect avapConnector registerEndpoint addResult</o>\n"
|
"<o>AVAP addParam ormAccessSelect avapConnector registerEndpoint addResult</o>\n"
|
||||||
"</example>\n"
|
"</example>\n"
|
||||||
"</examples>\n\n"
|
"</examples>\n\n"
|
||||||
|
|
@ -232,6 +270,7 @@ GENERATE_PROMPT = SystemMessage(
|
||||||
"</thinking_steps>\n\n"
|
"</thinking_steps>\n\n"
|
||||||
|
|
||||||
"<output_format>\n"
|
"<output_format>\n"
|
||||||
|
"Answer in the same language the user used.\n\n"
|
||||||
"Answer:\n"
|
"Answer:\n"
|
||||||
"<direct answer; include code blocks if context has relevant code>\n\n"
|
"<direct answer; include code blocks if context has relevant code>\n\n"
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,4 @@
|
||||||
|
import base64
|
||||||
import logging
|
import logging
|
||||||
import os
|
import os
|
||||||
from concurrent import futures
|
from concurrent import futures
|
||||||
|
|
@ -79,7 +80,33 @@ class BrunixEngine(brunix_pb2_grpc.AssistanceEngineServicer):
|
||||||
def AskAgent(self, request, context):
|
def AskAgent(self, request, context):
|
||||||
session_id = request.session_id or "default"
|
session_id = request.session_id or "default"
|
||||||
query = request.query
|
query = request.query
|
||||||
logger.info(f"[AskAgent] session={session_id} query='{query[:80]}'")
|
|
||||||
|
try:
|
||||||
|
editor_content = base64.b64decode(request.editor_content).decode("utf-8") if request.editor_content else ""
|
||||||
|
except Exception:
|
||||||
|
editor_content = ""
|
||||||
|
logger.warning("[AskAgent] editor_content base64 decode failed")
|
||||||
|
|
||||||
|
|
||||||
|
try:
|
||||||
|
selected_text = base64.b64decode(request.selected_text).decode("utf-8") if request.selected_text else ""
|
||||||
|
except Exception:
|
||||||
|
selected_text = ""
|
||||||
|
logger.warning("[AskAgent] selected_text base64 decode failed")
|
||||||
|
|
||||||
|
try:
|
||||||
|
extra_context = base64.b64decode(request.extra_context).decode("utf-8") if request.extra_context else ""
|
||||||
|
except Exception:
|
||||||
|
extra_context = ""
|
||||||
|
logger.warning("[AskAgent] extra_context base64 decode failed")
|
||||||
|
|
||||||
|
user_info = request.user_info or "{}"
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"[AskAgent] session={session_id} "
|
||||||
|
f"editor={bool(editor_content)} selected={bool(selected_text)} "
|
||||||
|
f"query='{query[:80]}'"
|
||||||
|
)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
history = list(session_store.get(session_id, []))
|
history = list(session_store.get(session_id, []))
|
||||||
|
|
@ -91,6 +118,11 @@ class BrunixEngine(brunix_pb2_grpc.AssistanceEngineServicer):
|
||||||
"reformulated_query": "",
|
"reformulated_query": "",
|
||||||
"context": "",
|
"context": "",
|
||||||
"query_type": "",
|
"query_type": "",
|
||||||
|
|
||||||
|
"editor_content": editor_content,
|
||||||
|
"selected_text": selected_text,
|
||||||
|
"extra_context": extra_context,
|
||||||
|
"user_info": user_info
|
||||||
}
|
}
|
||||||
|
|
||||||
final_state = self.graph.invoke(initial_state)
|
final_state = self.graph.invoke(initial_state)
|
||||||
|
|
@ -119,7 +151,33 @@ class BrunixEngine(brunix_pb2_grpc.AssistanceEngineServicer):
|
||||||
def AskAgentStream(self, request, context):
|
def AskAgentStream(self, request, context):
|
||||||
session_id = request.session_id or "default"
|
session_id = request.session_id or "default"
|
||||||
query = request.query
|
query = request.query
|
||||||
logger.info(f"[AskAgentStream] session={session_id} query='{query[:80]}'")
|
|
||||||
|
try:
|
||||||
|
editor_content = base64.b64decode(request.editor_content).decode("utf-8") if request.editor_content else ""
|
||||||
|
except Exception:
|
||||||
|
editor_content = ""
|
||||||
|
logger.warning("[AskAgent] editor_content base64 decode failed")
|
||||||
|
|
||||||
|
|
||||||
|
try:
|
||||||
|
selected_text = base64.b64decode(request.selected_text).decode("utf-8") if request.selected_text else ""
|
||||||
|
except Exception:
|
||||||
|
selected_text = ""
|
||||||
|
logger.warning("[AskAgent] selected_text base64 decode failed")
|
||||||
|
|
||||||
|
try:
|
||||||
|
extra_context = base64.b64decode(request.extra_context).decode("utf-8") if request.extra_context else ""
|
||||||
|
except Exception:
|
||||||
|
extra_context = ""
|
||||||
|
logger.warning("[AskAgent] extra_context base64 decode failed")
|
||||||
|
|
||||||
|
user_info = request.user_info or "{}"
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"[AskAgentStream] session={session_id} "
|
||||||
|
f"editor={bool(editor_content)} selected={bool(selected_text)} "
|
||||||
|
f"query='{query[:80]}'"
|
||||||
|
)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
history = list(session_store.get(session_id, []))
|
history = list(session_store.get(session_id, []))
|
||||||
|
|
@ -131,6 +189,11 @@ class BrunixEngine(brunix_pb2_grpc.AssistanceEngineServicer):
|
||||||
"reformulated_query": "",
|
"reformulated_query": "",
|
||||||
"context": "",
|
"context": "",
|
||||||
"query_type": "",
|
"query_type": "",
|
||||||
|
|
||||||
|
"editor_content": editor_content,
|
||||||
|
"selected_text": selected_text,
|
||||||
|
"extra_context": extra_context,
|
||||||
|
"user_info": user_info
|
||||||
}
|
}
|
||||||
|
|
||||||
prepared = self.prepare_graph.invoke(initial_state)
|
prepared = self.prepare_graph.invoke(initial_state)
|
||||||
|
|
|
||||||
|
|
@ -4,8 +4,15 @@ from langgraph.graph.message import add_messages
|
||||||
|
|
||||||
|
|
||||||
class AgentState(TypedDict):
|
class AgentState(TypedDict):
|
||||||
|
# -- CORE
|
||||||
messages: Annotated[list, add_messages]
|
messages: Annotated[list, add_messages]
|
||||||
reformulated_query: str
|
reformulated_query: str
|
||||||
context: str
|
context: str
|
||||||
query_type: str
|
query_type: str
|
||||||
session_id: str
|
session_id: str
|
||||||
|
# -- OPEN AI API
|
||||||
|
editor_content: str
|
||||||
|
selected_text: str
|
||||||
|
extra_context: str
|
||||||
|
user_info: str
|
||||||
|
use_editor_context: bool
|
||||||
|
|
@ -0,0 +1,396 @@
|
||||||
|
"""
|
||||||
|
tests/test_prd_0002.py
|
||||||
|
|
||||||
|
Unit tests for PRD-0002 — Editor Context Injection.
|
||||||
|
|
||||||
|
These tests run without any external dependencies (no Elasticsearch, no Ollama,
|
||||||
|
no gRPC server). They validate the logic of the components modified in PRD-0002:
|
||||||
|
|
||||||
|
- _parse_query_type — classifier output parser (graph.py)
|
||||||
|
- _parse_editor_context — user field parser (openai_proxy.py)
|
||||||
|
- _build_classify_prompt — classify prompt builder (graph.py)
|
||||||
|
- _build_reformulate_query — reformulate anchor builder (graph.py)
|
||||||
|
- _build_generation_prompt — generation prompt builder (graph.py)
|
||||||
|
- _decode_b64 — base64 decoder (server.py)
|
||||||
|
|
||||||
|
Run with:
|
||||||
|
pytest tests/test_prd_0002.py -v
|
||||||
|
"""
|
||||||
|
|
||||||
|
import base64
|
||||||
|
import json
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Minimal stubs so we can import graph.py and openai_proxy.py without
|
||||||
|
# the full Docker/src environment loaded
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
# Stub brunix_pb2 so openai_proxy imports cleanly
|
||||||
|
import types
|
||||||
|
|
||||||
|
brunix_pb2 = types.ModuleType("brunix_pb2")
|
||||||
|
brunix_pb2.AgentRequest = lambda **kw: kw
|
||||||
|
brunix_pb2.AgentResponse = lambda **kw: kw
|
||||||
|
sys.modules["brunix_pb2"] = brunix_pb2
|
||||||
|
sys.modules["brunix_pb2_grpc"] = types.ModuleType("brunix_pb2_grpc")
|
||||||
|
|
||||||
|
# Stub grpc
|
||||||
|
grpc_mod = types.ModuleType("grpc")
|
||||||
|
grpc_mod.insecure_channel = lambda *a, **kw: None
|
||||||
|
grpc_mod.Channel = object
|
||||||
|
grpc_mod.RpcError = Exception
|
||||||
|
sys.modules["grpc"] = grpc_mod
|
||||||
|
|
||||||
|
# Stub grpc_reflection
|
||||||
|
refl = types.ModuleType("grpc_reflection.v1alpha.reflection")
|
||||||
|
sys.modules["grpc_reflection"] = types.ModuleType("grpc_reflection")
|
||||||
|
sys.modules["grpc_reflection.v1alpha"] = types.ModuleType("grpc_reflection.v1alpha")
|
||||||
|
sys.modules["grpc_reflection.v1alpha.reflection"] = refl
|
||||||
|
|
||||||
|
# Add Docker/src to path so we can import the modules directly
|
||||||
|
DOCKER_SRC = os.path.join(os.path.dirname(__file__), "..", "Docker", "src")
|
||||||
|
sys.path.insert(0, os.path.abspath(DOCKER_SRC))
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Import the functions under test
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
# We import only the pure functions — no LLM, no ES, no gRPC calls
|
||||||
|
|
||||||
|
def _parse_query_type(raw: str):
|
||||||
|
"""Copy of _parse_query_type from graph.py — tested in isolation."""
|
||||||
|
parts = raw.strip().upper().split()
|
||||||
|
query_type = "RETRIEVAL"
|
||||||
|
use_editor = False
|
||||||
|
if parts:
|
||||||
|
first = parts[0]
|
||||||
|
if first.startswith("CODE_GENERATION") or "CODE" in first:
|
||||||
|
query_type = "CODE_GENERATION"
|
||||||
|
elif first.startswith("CONVERSATIONAL"):
|
||||||
|
query_type = "CONVERSATIONAL"
|
||||||
|
if len(parts) > 1 and parts[1] == "EDITOR":
|
||||||
|
use_editor = True
|
||||||
|
return query_type, use_editor
|
||||||
|
|
||||||
|
|
||||||
|
def _decode_b64(value: str) -> str:
|
||||||
|
"""Copy of _decode_b64 from server.py — tested in isolation."""
|
||||||
|
try:
|
||||||
|
return base64.b64decode(value).decode("utf-8") if value else ""
|
||||||
|
except Exception:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_editor_context(user):
|
||||||
|
"""Copy of _parse_editor_context from openai_proxy.py — tested in isolation."""
|
||||||
|
if not user:
|
||||||
|
return "", "", "", ""
|
||||||
|
try:
|
||||||
|
ctx = json.loads(user)
|
||||||
|
if isinstance(ctx, dict):
|
||||||
|
return (
|
||||||
|
ctx.get("editor_content", "") or "",
|
||||||
|
ctx.get("selected_text", "") or "",
|
||||||
|
ctx.get("extra_context", "") or "",
|
||||||
|
json.dumps(ctx.get("user_info", {})),
|
||||||
|
)
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
pass
|
||||||
|
return "", "", "", ""
|
||||||
|
|
||||||
|
|
||||||
|
def _build_reformulate_query(question: str, selected_text: str) -> str:
|
||||||
|
"""Copy of _build_reformulate_query from graph.py — tested in isolation."""
|
||||||
|
if not selected_text:
|
||||||
|
return question
|
||||||
|
return f"{selected_text}\n\nUser question about the above: {question}"
|
||||||
|
|
||||||
|
|
||||||
|
def _build_generation_prompt_injects(editor_content, selected_text, use_editor):
|
||||||
|
"""Helper — returns True if editor context would be injected."""
|
||||||
|
sections = []
|
||||||
|
if selected_text and use_editor:
|
||||||
|
sections.append("selected_code")
|
||||||
|
if editor_content and use_editor:
|
||||||
|
sections.append("editor_file")
|
||||||
|
return len(sections) > 0
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Tests: _parse_query_type
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class TestParseQueryType:
|
||||||
|
|
||||||
|
def test_retrieval_no_editor(self):
|
||||||
|
qt, ue = _parse_query_type("RETRIEVAL NO_EDITOR")
|
||||||
|
assert qt == "RETRIEVAL"
|
||||||
|
assert ue is False
|
||||||
|
|
||||||
|
def test_retrieval_editor(self):
|
||||||
|
qt, ue = _parse_query_type("RETRIEVAL EDITOR")
|
||||||
|
assert qt == "RETRIEVAL"
|
||||||
|
assert ue is True
|
||||||
|
|
||||||
|
def test_code_generation_no_editor(self):
|
||||||
|
qt, ue = _parse_query_type("CODE_GENERATION NO_EDITOR")
|
||||||
|
assert qt == "CODE_GENERATION"
|
||||||
|
assert ue is False
|
||||||
|
|
||||||
|
def test_code_generation_editor(self):
|
||||||
|
qt, ue = _parse_query_type("CODE_GENERATION EDITOR")
|
||||||
|
assert qt == "CODE_GENERATION"
|
||||||
|
assert ue is True
|
||||||
|
|
||||||
|
def test_conversational_no_editor(self):
|
||||||
|
qt, ue = _parse_query_type("CONVERSATIONAL NO_EDITOR")
|
||||||
|
assert qt == "CONVERSATIONAL"
|
||||||
|
assert ue is False
|
||||||
|
|
||||||
|
def test_single_token_defaults_no_editor(self):
|
||||||
|
"""If model returns only one token, use_editor defaults to False."""
|
||||||
|
qt, ue = _parse_query_type("RETRIEVAL")
|
||||||
|
assert qt == "RETRIEVAL"
|
||||||
|
assert ue is False
|
||||||
|
|
||||||
|
def test_empty_defaults_retrieval_no_editor(self):
|
||||||
|
qt, ue = _parse_query_type("")
|
||||||
|
assert qt == "RETRIEVAL"
|
||||||
|
assert ue is False
|
||||||
|
|
||||||
|
def test_case_insensitive(self):
|
||||||
|
qt, ue = _parse_query_type("retrieval editor")
|
||||||
|
assert qt == "RETRIEVAL"
|
||||||
|
assert ue is True
|
||||||
|
|
||||||
|
def test_code_shorthand(self):
|
||||||
|
"""'CODE' alone should map to CODE_GENERATION."""
|
||||||
|
qt, ue = _parse_query_type("CODE NO_EDITOR")
|
||||||
|
assert qt == "CODE_GENERATION"
|
||||||
|
assert ue is False
|
||||||
|
|
||||||
|
def test_extra_whitespace(self):
|
||||||
|
qt, ue = _parse_query_type(" RETRIEVAL NO_EDITOR ")
|
||||||
|
assert qt == "RETRIEVAL"
|
||||||
|
assert ue is False
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Tests: _decode_b64
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class TestDecodeB64:
|
||||||
|
|
||||||
|
def test_valid_base64_spanish(self):
|
||||||
|
text = "addVar(mensaje, \"Hola mundo\")\naddResult(mensaje)"
|
||||||
|
encoded = base64.b64encode(text.encode("utf-8")).decode("utf-8")
|
||||||
|
assert _decode_b64(encoded) == text
|
||||||
|
|
||||||
|
def test_valid_base64_english(self):
|
||||||
|
text = "registerEndpoint(\"GET\", \"/hello\", [], \"public\", handler, \"\")"
|
||||||
|
encoded = base64.b64encode(text.encode("utf-8")).decode("utf-8")
|
||||||
|
assert _decode_b64(encoded) == text
|
||||||
|
|
||||||
|
def test_empty_string_returns_empty(self):
|
||||||
|
assert _decode_b64("") == ""
|
||||||
|
|
||||||
|
def test_none_equivalent_empty(self):
|
||||||
|
assert _decode_b64(None) == ""
|
||||||
|
|
||||||
|
def test_invalid_base64_returns_empty(self):
|
||||||
|
assert _decode_b64("not_valid_base64!!!") == ""
|
||||||
|
|
||||||
|
def test_unicode_content(self):
|
||||||
|
text = "// función de validación\nif(token, \"SECRET\", \"=\")"
|
||||||
|
encoded = base64.b64encode(text.encode("utf-8")).decode("utf-8")
|
||||||
|
assert _decode_b64(encoded) == text
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Tests: _parse_editor_context
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class TestParseEditorContext:
|
||||||
|
|
||||||
|
def _encode(self, text: str) -> str:
|
||||||
|
return base64.b64encode(text.encode()).decode()
|
||||||
|
|
||||||
|
def test_full_context_parsed(self):
|
||||||
|
editor = self._encode("addVar(x, 10)")
|
||||||
|
selected = self._encode("addResult(x)")
|
||||||
|
extra = self._encode("/path/to/file.avap")
|
||||||
|
user_json = json.dumps({
|
||||||
|
"editor_content": editor,
|
||||||
|
"selected_text": selected,
|
||||||
|
"extra_context": extra,
|
||||||
|
"user_info": {"dev_id": 1, "project_id": 2, "org_id": 3}
|
||||||
|
})
|
||||||
|
ec, st, ex, ui = _parse_editor_context(user_json)
|
||||||
|
assert ec == editor
|
||||||
|
assert st == selected
|
||||||
|
assert ex == extra
|
||||||
|
assert json.loads(ui) == {"dev_id": 1, "project_id": 2, "org_id": 3}
|
||||||
|
|
||||||
|
def test_empty_user_returns_empty_tuple(self):
|
||||||
|
ec, st, ex, ui = _parse_editor_context(None)
|
||||||
|
assert ec == st == ex == ""
|
||||||
|
|
||||||
|
def test_empty_string_returns_empty_tuple(self):
|
||||||
|
ec, st, ex, ui = _parse_editor_context("")
|
||||||
|
assert ec == st == ex == ""
|
||||||
|
|
||||||
|
def test_plain_string_not_json_returns_empty(self):
|
||||||
|
"""Non-JSON user field — backward compat, no error raised."""
|
||||||
|
ec, st, ex, ui = _parse_editor_context("plain string")
|
||||||
|
assert ec == st == ex == ""
|
||||||
|
|
||||||
|
def test_missing_fields_default_empty(self):
|
||||||
|
user_json = json.dumps({"editor_content": "abc"})
|
||||||
|
ec, st, ex, ui = _parse_editor_context(user_json)
|
||||||
|
assert ec == "abc"
|
||||||
|
assert st == ""
|
||||||
|
assert ex == ""
|
||||||
|
|
||||||
|
def test_user_info_missing_defaults_empty_object(self):
|
||||||
|
user_json = json.dumps({"editor_content": "abc"})
|
||||||
|
_, _, _, ui = _parse_editor_context(user_json)
|
||||||
|
assert json.loads(ui) == {}
|
||||||
|
|
||||||
|
def test_user_info_full_object(self):
|
||||||
|
user_json = json.dumps({
|
||||||
|
"editor_content": "",
|
||||||
|
"selected_text": "",
|
||||||
|
"extra_context": "",
|
||||||
|
"user_info": {"dev_id": 42, "project_id": 7, "org_id": 99}
|
||||||
|
})
|
||||||
|
_, _, _, ui = _parse_editor_context(user_json)
|
||||||
|
parsed = json.loads(ui)
|
||||||
|
assert parsed["dev_id"] == 42
|
||||||
|
assert parsed["project_id"] == 7
|
||||||
|
assert parsed["org_id"] == 99
|
||||||
|
|
||||||
|
def test_session_id_not_leaked_into_context(self):
|
||||||
|
"""session_id must NOT appear in editor context — it has its own field."""
|
||||||
|
user_json = json.dumps({
|
||||||
|
"editor_content": "",
|
||||||
|
"selected_text": "",
|
||||||
|
"extra_context": "",
|
||||||
|
"user_info": {}
|
||||||
|
})
|
||||||
|
ec, st, ex, ui = _parse_editor_context(user_json)
|
||||||
|
assert "session_id" not in ec
|
||||||
|
assert "session_id" not in st
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Tests: _build_reformulate_query
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class TestBuildReformulateQuery:
|
||||||
|
|
||||||
|
def test_no_selected_text_returns_question(self):
|
||||||
|
q = "Que significa AVAP?"
|
||||||
|
assert _build_reformulate_query(q, "") == q
|
||||||
|
|
||||||
|
def test_selected_text_prepended_to_question(self):
|
||||||
|
q = "que hace esto?"
|
||||||
|
selected = "addVar(x, 10)\naddResult(x)"
|
||||||
|
result = _build_reformulate_query(q, selected)
|
||||||
|
assert result.startswith(selected)
|
||||||
|
assert q in result
|
||||||
|
|
||||||
|
def test_selected_text_anchor_format(self):
|
||||||
|
q = "fix this"
|
||||||
|
selected = "try()\n ormDirect(query, res)\nexception(e)\nend()"
|
||||||
|
result = _build_reformulate_query(q, selected)
|
||||||
|
assert "User question about the above:" in result
|
||||||
|
assert selected in result
|
||||||
|
assert q in result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Tests: editor context injection logic
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class TestEditorContextInjection:
|
||||||
|
|
||||||
|
def test_no_injection_when_use_editor_false(self):
|
||||||
|
"""Editor content must NOT be injected when use_editor_context is False."""
|
||||||
|
injected = _build_generation_prompt_injects(
|
||||||
|
editor_content = "addVar(x, 10)",
|
||||||
|
selected_text = "addResult(x)",
|
||||||
|
use_editor = False,
|
||||||
|
)
|
||||||
|
assert injected is False
|
||||||
|
|
||||||
|
def test_injection_when_use_editor_true_and_content_present(self):
|
||||||
|
"""Editor content MUST be injected when use_editor_context is True."""
|
||||||
|
injected = _build_generation_prompt_injects(
|
||||||
|
editor_content = "addVar(x, 10)",
|
||||||
|
selected_text = "addResult(x)",
|
||||||
|
use_editor = True,
|
||||||
|
)
|
||||||
|
assert injected is True
|
||||||
|
|
||||||
|
def test_no_injection_when_content_empty_even_if_flag_true(self):
|
||||||
|
"""Empty fields must never be injected even if flag is True."""
|
||||||
|
injected = _build_generation_prompt_injects(
|
||||||
|
editor_content = "",
|
||||||
|
selected_text = "",
|
||||||
|
use_editor = True,
|
||||||
|
)
|
||||||
|
assert injected is False
|
||||||
|
|
||||||
|
def test_partial_injection_selected_only(self):
|
||||||
|
"""selected_text alone triggers injection when flag is True."""
|
||||||
|
injected = _build_generation_prompt_injects(
|
||||||
|
editor_content = "",
|
||||||
|
selected_text = "addResult(x)",
|
||||||
|
use_editor = True,
|
||||||
|
)
|
||||||
|
assert injected is True
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Tests: classifier routing — EDITOR signal
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
class TestClassifierEditorSignal:
|
||||||
|
"""
|
||||||
|
These tests validate that the two-token output format is correctly parsed
|
||||||
|
for all combinations the classifier can produce.
|
||||||
|
"""
|
||||||
|
|
||||||
|
VALID_OUTPUTS = [
|
||||||
|
("RETRIEVAL NO_EDITOR", "RETRIEVAL", False),
|
||||||
|
("RETRIEVAL EDITOR", "RETRIEVAL", True),
|
||||||
|
("CODE_GENERATION NO_EDITOR", "CODE_GENERATION", False),
|
||||||
|
("CODE_GENERATION EDITOR", "CODE_GENERATION", True),
|
||||||
|
("CONVERSATIONAL NO_EDITOR", "CONVERSATIONAL", False),
|
||||||
|
("CONVERSATIONAL EDITOR", "CONVERSATIONAL", True),
|
||||||
|
]
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("raw,expected_qt,expected_ue", VALID_OUTPUTS)
|
||||||
|
def test_valid_two_token_output(self, raw, expected_qt, expected_ue):
|
||||||
|
qt, ue = _parse_query_type(raw)
|
||||||
|
assert qt == expected_qt
|
||||||
|
assert ue == expected_ue
|
||||||
|
|
||||||
|
def test_editor_flag_false_for_general_avap_question(self):
|
||||||
|
"""'Que significa AVAP?' -> RETRIEVAL NO_EDITOR."""
|
||||||
|
qt, ue = _parse_query_type("RETRIEVAL NO_EDITOR")
|
||||||
|
assert ue is False
|
||||||
|
|
||||||
|
def test_editor_flag_true_for_explicit_editor_reference(self):
|
||||||
|
"""'que hace este codigo?' with selected_text -> RETRIEVAL EDITOR."""
|
||||||
|
qt, ue = _parse_query_type("RETRIEVAL EDITOR")
|
||||||
|
assert ue is True
|
||||||
|
|
||||||
|
def test_editor_flag_false_for_code_generation_without_reference(self):
|
||||||
|
"""'dame un API de hello world' -> CODE_GENERATION NO_EDITOR."""
|
||||||
|
qt, ue = _parse_query_type("CODE_GENERATION NO_EDITOR")
|
||||||
|
assert ue is False
|
||||||
|
|
@ -0,0 +1,448 @@
|
||||||
|
NOTICE
|
||||||
|
======
|
||||||
|
|
||||||
|
Brunix Assistance Engine
|
||||||
|
Copyright (c) 2026 101OBEX Corp. All rights reserved.
|
||||||
|
|
||||||
|
This product includes software developed by third parties under open source
|
||||||
|
licenses. The following is a list of the open source components used in this
|
||||||
|
product, along with their respective licenses and copyright notices.
|
||||||
|
|
||||||
|
-------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
RUNTIME DEPENDENCIES (Docker/requirements.txt)
|
||||||
|
-----------------------------------------------
|
||||||
|
|
||||||
|
aiohttp (3.13.3)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: aio-libs contributors
|
||||||
|
https://github.com/aio-libs/aiohttp
|
||||||
|
|
||||||
|
annotated-types (0.7.0)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Adrian Garcia Badaracco, Samuel Colvin, Zac Hatfield-Dodds
|
||||||
|
https://github.com/annotated-types/annotated-types
|
||||||
|
|
||||||
|
anyio (4.12.1)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Alex Grönholm
|
||||||
|
https://github.com/agronholm/anyio
|
||||||
|
|
||||||
|
attrs (25.4.0)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Hynek Schlawack
|
||||||
|
https://github.com/python-attrs/attrs
|
||||||
|
|
||||||
|
boto3 (1.42.58)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: Amazon Web Services
|
||||||
|
https://github.com/boto/boto3
|
||||||
|
|
||||||
|
botocore (1.42.58)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: Amazon Web Services
|
||||||
|
https://github.com/boto/botocore
|
||||||
|
|
||||||
|
certifi
|
||||||
|
License: MPL 2.0
|
||||||
|
Copyright: Kenneth Reitz
|
||||||
|
https://github.com/certifi/python-certifi
|
||||||
|
|
||||||
|
charset-normalizer (3.4.4)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Ahmed TAHRI
|
||||||
|
https://github.com/Ousret/charset_normalizer
|
||||||
|
|
||||||
|
chonkie (1.5.6)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Bhavnick Minhas
|
||||||
|
https://github.com/chonkie-ai/chonkie
|
||||||
|
|
||||||
|
click (8.3.1)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: Armin Ronacher
|
||||||
|
https://github.com/pallets/click
|
||||||
|
|
||||||
|
dataclasses-json (0.6.7)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Lídia Contreras, Radek Nohejl
|
||||||
|
https://github.com/lidatong/dataclasses-json
|
||||||
|
|
||||||
|
elastic-transport (8.17.1)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: Elasticsearch B.V.
|
||||||
|
https://github.com/elastic/elastic-transport-python
|
||||||
|
|
||||||
|
elasticsearch (8.19.3)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: Elasticsearch B.V.
|
||||||
|
https://github.com/elastic/elasticsearch-py
|
||||||
|
|
||||||
|
fastapi (0.111+)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Sebastián Ramírez
|
||||||
|
https://github.com/fastapi/fastapi
|
||||||
|
|
||||||
|
filelock (3.24.3)
|
||||||
|
License: Unlicense / Public Domain
|
||||||
|
https://github.com/tox-dev/filelock
|
||||||
|
|
||||||
|
grpcio (1.78.1)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: The gRPC Authors
|
||||||
|
https://github.com/grpc/grpc
|
||||||
|
|
||||||
|
grpcio-reflection (1.78.1)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: The gRPC Authors
|
||||||
|
https://github.com/grpc/grpc
|
||||||
|
|
||||||
|
grpcio-tools (1.78.1)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: The gRPC Authors
|
||||||
|
https://github.com/grpc/grpc
|
||||||
|
|
||||||
|
httpcore (1.0.9)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: Tom Christie
|
||||||
|
https://github.com/encode/httpcore
|
||||||
|
|
||||||
|
httpx (0.28.1)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: Tom Christie
|
||||||
|
https://github.com/encode/httpx
|
||||||
|
|
||||||
|
huggingface-hub (0.36.2)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: HuggingFace Inc.
|
||||||
|
https://github.com/huggingface/huggingface_hub
|
||||||
|
|
||||||
|
jinja2 (3.1.6)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: Armin Ronacher
|
||||||
|
https://github.com/pallets/jinja
|
||||||
|
|
||||||
|
joblib (1.5.3)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: Gael Varoquaux
|
||||||
|
https://github.com/joblib/joblib
|
||||||
|
|
||||||
|
jsonpatch (1.33)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: Stefan Kögl
|
||||||
|
https://github.com/stefankoegl/python-json-patch
|
||||||
|
|
||||||
|
langchain (1.2.10)
|
||||||
|
License: MIT
|
||||||
|
Copyright: LangChain, Inc.
|
||||||
|
https://github.com/langchain-ai/langchain
|
||||||
|
|
||||||
|
langchain-anthropic
|
||||||
|
License: MIT
|
||||||
|
Copyright: LangChain, Inc.
|
||||||
|
https://github.com/langchain-ai/langchain
|
||||||
|
|
||||||
|
langchain-aws (1.3.1)
|
||||||
|
License: MIT
|
||||||
|
Copyright: LangChain, Inc.
|
||||||
|
https://github.com/langchain-ai/langchain
|
||||||
|
|
||||||
|
langchain-community (0.4.1)
|
||||||
|
License: MIT
|
||||||
|
Copyright: LangChain, Inc.
|
||||||
|
https://github.com/langchain-ai/langchain
|
||||||
|
|
||||||
|
langchain-core (1.2.15)
|
||||||
|
License: MIT
|
||||||
|
Copyright: LangChain, Inc.
|
||||||
|
https://github.com/langchain-ai/langchain
|
||||||
|
|
||||||
|
langchain-elasticsearch (1.0.0)
|
||||||
|
License: MIT
|
||||||
|
Copyright: LangChain, Inc.
|
||||||
|
https://github.com/langchain-ai/langchain
|
||||||
|
|
||||||
|
langchain-huggingface (1.2.0)
|
||||||
|
License: MIT
|
||||||
|
Copyright: LangChain, Inc.
|
||||||
|
https://github.com/langchain-ai/langchain
|
||||||
|
|
||||||
|
langchain-ollama (1.0.1)
|
||||||
|
License: MIT
|
||||||
|
Copyright: LangChain, Inc.
|
||||||
|
https://github.com/langchain-ai/langchain
|
||||||
|
|
||||||
|
langgraph (1.0.9)
|
||||||
|
License: MIT
|
||||||
|
Copyright: LangChain, Inc.
|
||||||
|
https://github.com/langchain-ai/langgraph
|
||||||
|
|
||||||
|
langsmith (0.7.6)
|
||||||
|
License: MIT
|
||||||
|
Copyright: LangChain, Inc.
|
||||||
|
https://github.com/langchain-ai/langsmith
|
||||||
|
|
||||||
|
loguru (0.7.3)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Delgan
|
||||||
|
https://github.com/Delgan/loguru
|
||||||
|
|
||||||
|
model2vec (0.7.0)
|
||||||
|
License: MIT
|
||||||
|
Copyright: MinishLab
|
||||||
|
https://github.com/MinishLab/model2vec
|
||||||
|
|
||||||
|
nltk (3.9.3)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: NLTK Project
|
||||||
|
https://github.com/nltk/nltk
|
||||||
|
|
||||||
|
numpy (2.4.2)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: NumPy Developers
|
||||||
|
https://github.com/numpy/numpy
|
||||||
|
|
||||||
|
ollama (0.6.1)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Ollama
|
||||||
|
https://github.com/ollama/ollama-python
|
||||||
|
|
||||||
|
orjson (3.11.7)
|
||||||
|
License: Apache 2.0 / MIT
|
||||||
|
Copyright: ijl
|
||||||
|
https://github.com/ijl/orjson
|
||||||
|
|
||||||
|
packaging (24.2)
|
||||||
|
License: Apache 2.0 / BSD 2-Clause
|
||||||
|
Copyright: PyPA
|
||||||
|
https://github.com/pypa/packaging
|
||||||
|
|
||||||
|
pandas (3.0.1)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: The Pandas Development Team
|
||||||
|
https://github.com/pandas-dev/pandas
|
||||||
|
|
||||||
|
protobuf (6.33.5)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: Google LLC
|
||||||
|
https://github.com/protocolbuffers/protobuf
|
||||||
|
|
||||||
|
pydantic (2.12.5)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Samuel Colvin
|
||||||
|
https://github.com/pydantic/pydantic
|
||||||
|
|
||||||
|
pydantic-settings (2.13.1)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Samuel Colvin
|
||||||
|
https://github.com/pydantic/pydantic-settings
|
||||||
|
|
||||||
|
pygments (2.19.2)
|
||||||
|
License: BSD 2-Clause
|
||||||
|
Copyright: Georg Brandl
|
||||||
|
https://github.com/pygments/pygments
|
||||||
|
|
||||||
|
python-dateutil (2.9.0)
|
||||||
|
License: Apache 2.0 / BSD 3-Clause
|
||||||
|
Copyright: Gustavo Niemeyer
|
||||||
|
https://github.com/dateutil/dateutil
|
||||||
|
|
||||||
|
python-dotenv (1.2.1)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: Saurabh Kumar
|
||||||
|
https://github.com/theskumar/python-dotenv
|
||||||
|
|
||||||
|
pyyaml (6.0.3)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Kirill Simonov
|
||||||
|
https://github.com/yaml/pyyaml
|
||||||
|
|
||||||
|
ragas (0.4.3+)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: Exploding Gradients
|
||||||
|
https://github.com/explodinggradients/ragas
|
||||||
|
|
||||||
|
rapidfuzz (3.14.3)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Max Bachmann
|
||||||
|
https://github.com/rapidfuzz/RapidFuzz
|
||||||
|
|
||||||
|
regex (2026.2.19)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: Matthew Barnett
|
||||||
|
https://github.com/mrabarnett/mrab-regex
|
||||||
|
|
||||||
|
requests (2.32.5)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: Kenneth Reitz
|
||||||
|
https://github.com/psf/requests
|
||||||
|
|
||||||
|
rich (14.3.3)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Will McGugan
|
||||||
|
https://github.com/Textualize/rich
|
||||||
|
|
||||||
|
s3transfer (0.16.0)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: Amazon Web Services
|
||||||
|
https://github.com/boto/s3transfer
|
||||||
|
|
||||||
|
safetensors (0.7.0)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: HuggingFace Inc.
|
||||||
|
https://github.com/huggingface/safetensors
|
||||||
|
|
||||||
|
setuptools (82.0.0)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Jason R. Coombs
|
||||||
|
https://github.com/pypa/setuptools
|
||||||
|
|
||||||
|
six (1.17.0)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Benjamin Peterson
|
||||||
|
https://github.com/benjaminp/six
|
||||||
|
|
||||||
|
sqlalchemy (2.0.46)
|
||||||
|
License: MIT
|
||||||
|
Copyright: SQLAlchemy authors
|
||||||
|
https://github.com/sqlalchemy/sqlalchemy
|
||||||
|
|
||||||
|
tenacity (9.1.4)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: Julien Danjou
|
||||||
|
https://github.com/jd/tenacity
|
||||||
|
|
||||||
|
tokenizers (0.22.2)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: HuggingFace Inc.
|
||||||
|
https://github.com/huggingface/tokenizers
|
||||||
|
|
||||||
|
tqdm (4.67.3)
|
||||||
|
License: MIT / MPL 2.0
|
||||||
|
Copyright: Casper da Costa-Luis
|
||||||
|
https://github.com/tqdm/tqdm
|
||||||
|
|
||||||
|
typing-extensions (4.15.0)
|
||||||
|
License: PSF 2.0
|
||||||
|
Copyright: Python Software Foundation
|
||||||
|
https://github.com/python/typing_extensions
|
||||||
|
|
||||||
|
urllib3 (2.6.3)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Andrey Petrov
|
||||||
|
https://github.com/urllib3/urllib3
|
||||||
|
|
||||||
|
uvicorn (0.29+)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: Tom Christie
|
||||||
|
https://github.com/encode/uvicorn
|
||||||
|
|
||||||
|
xxhash (3.6.0)
|
||||||
|
License: BSD 2-Clause
|
||||||
|
Copyright: Yue Du
|
||||||
|
https://github.com/ifduyue/python-xxhash
|
||||||
|
|
||||||
|
yarl (1.22.0)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: aio-libs contributors
|
||||||
|
https://github.com/aio-libs/yarl
|
||||||
|
|
||||||
|
zstandard (0.25.0)
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: Gregory Szorc
|
||||||
|
https://github.com/indygreg/python-zstandard
|
||||||
|
|
||||||
|
-------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
DEVELOPMENT DEPENDENCIES (pyproject.toml — dev group)
|
||||||
|
------------------------------------------------------
|
||||||
|
These dependencies are used only during development and research.
|
||||||
|
They are not included in the production Docker image.
|
||||||
|
|
||||||
|
beir (2.2.0+)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: Nandan Thakur
|
||||||
|
https://github.com/beir-cellar/beir
|
||||||
|
|
||||||
|
datasets
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: HuggingFace Inc.
|
||||||
|
https://github.com/huggingface/datasets
|
||||||
|
|
||||||
|
jupyter
|
||||||
|
License: BSD 3-Clause
|
||||||
|
Copyright: Project Jupyter Contributors
|
||||||
|
https://github.com/jupyter/jupyter
|
||||||
|
|
||||||
|
langfuse (<3)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Langfuse GmbH
|
||||||
|
https://github.com/langfuse/langfuse
|
||||||
|
|
||||||
|
litellm (1.82.0+)
|
||||||
|
License: MIT
|
||||||
|
Copyright: BerriAI
|
||||||
|
https://github.com/BerriAI/litellm
|
||||||
|
|
||||||
|
mteb (2.8.8+)
|
||||||
|
License: Apache 2.0
|
||||||
|
Copyright: MTEB Authors
|
||||||
|
https://github.com/embeddings-benchmark/mteb
|
||||||
|
|
||||||
|
polars (1.38.1+)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Ritchie Vink
|
||||||
|
https://github.com/pola-rs/polars
|
||||||
|
|
||||||
|
ruff (0.15.1+)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Astral Software
|
||||||
|
https://github.com/astral-sh/ruff
|
||||||
|
|
||||||
|
tree-sitter-language-pack (0.13.0+)
|
||||||
|
License: MIT
|
||||||
|
Copyright: Various
|
||||||
|
https://github.com/Goldziher/tree-sitter-language-pack
|
||||||
|
|
||||||
|
-------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
EXTERNAL SERVICES (not bundled — accessed at runtime via API or network)
|
||||||
|
-------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Ollama
|
||||||
|
License: MIT
|
||||||
|
Copyright: Ollama, Inc.
|
||||||
|
https://github.com/ollama/ollama
|
||||||
|
Note: Used as local LLM and embedding inference server.
|
||||||
|
Not bundled in this repository.
|
||||||
|
|
||||||
|
Elasticsearch (8.x)
|
||||||
|
License: SSPL / Elastic License 2.0
|
||||||
|
Copyright: Elasticsearch B.V.
|
||||||
|
https://github.com/elastic/elasticsearch
|
||||||
|
Note: Used as vector database and full-text search engine.
|
||||||
|
Not bundled in this repository. Deployed separately on Devaron Cluster.
|
||||||
|
|
||||||
|
Anthropic Claude API
|
||||||
|
Copyright: Anthropic, PBC.
|
||||||
|
https://www.anthropic.com
|
||||||
|
Note: Used as evaluation judge in the EvaluateRAG pipeline.
|
||||||
|
Accessed via API key. Not bundled in this repository.
|
||||||
|
|
||||||
|
Langfuse
|
||||||
|
License: MIT (self-hosted)
|
||||||
|
Copyright: Langfuse GmbH
|
||||||
|
https://github.com/langfuse/langfuse
|
||||||
|
Note: Used for LLM observability and tracing.
|
||||||
|
Deployed separately on Devaron Cluster.
|
||||||
|
|
||||||
|
-------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
DISCLAIMER
|
||||||
|
|
||||||
|
The licenses listed above are provided for informational purposes only.
|
||||||
|
101OBEX Corp makes no representations or warranties regarding the accuracy
|
||||||
|
of this list. Users of this software are responsible for ensuring compliance
|
||||||
|
with the applicable license terms of all third-party components.
|
||||||
|
|
||||||
|
For questions regarding licensing, contact: https://www.101obex.com
|
||||||
47
README.md
47
README.md
|
|
@ -63,6 +63,8 @@ graph TD
|
||||||
│ │ └── utils/
|
│ │ └── utils/
|
||||||
│ │ ├── emb_factory.py # Provider-agnostic embedding model factory
|
│ │ ├── emb_factory.py # Provider-agnostic embedding model factory
|
||||||
│ │ └── llm_factory.py # Provider-agnostic LLM factory
|
│ │ └── llm_factory.py # Provider-agnostic LLM factory
|
||||||
|
│ ├── tests/
|
||||||
|
│ │ └── test_prd_0002.py # Unit tests — editor context, classifier, proxy parsing
|
||||||
│ ├── Dockerfile # Multi-stage container build
|
│ ├── Dockerfile # Multi-stage container build
|
||||||
│ ├── docker-compose.yaml # Local dev orchestration
|
│ ├── docker-compose.yaml # Local dev orchestration
|
||||||
│ ├── entrypoint.sh # Starts gRPC server + HTTP proxy in parallel
|
│ ├── entrypoint.sh # Starts gRPC server + HTTP proxy in parallel
|
||||||
|
|
@ -75,17 +77,26 @@ graph TD
|
||||||
│ ├── API_REFERENCE.md # Complete gRPC & HTTP API contract with examples
|
│ ├── API_REFERENCE.md # Complete gRPC & HTTP API contract with examples
|
||||||
│ ├── RUNBOOK.md # Operational playbooks and incident response
|
│ ├── RUNBOOK.md # Operational playbooks and incident response
|
||||||
│ ├── AVAP_CHUNKER_CONFIG.md # avap_config.json reference — blocks, statements, semantic tags
|
│ ├── AVAP_CHUNKER_CONFIG.md # avap_config.json reference — blocks, statements, semantic tags
|
||||||
│ ├── adr/ # Architecture Decision Records
|
│ ├── ADR/ # Architecture Decision Records
|
||||||
│ │ ├── ADR-0001-grpc-primary-interface.md
|
│ │ ├── ADR-0001-grpc-primary-interface.md
|
||||||
│ │ ├── ADR-0002-two-phase-streaming.md
|
│ │ ├── ADR-0002-two-phase-streaming.md
|
||||||
│ │ ├── ADR-0003-hybrid-retrieval-rrf.md
|
│ │ ├── ADR-0003-hybrid-retrieval-rrf.md
|
||||||
│ │ └── ADR-0004-claude-eval-judge.md
|
│ │ ├── ADR-0004-claude-eval-judge.md
|
||||||
|
│ │ └── ADR-0005-embedding-model-selection.md
|
||||||
|
│ └── product/ # Product Requirements Documents
|
||||||
|
│ ├── PRD-0001-openai-compatible-proxy.md
|
||||||
|
│ └── PRD-0002-editor-context-injection.md
|
||||||
│ ├── avap_language_github_docs/ # AVAP language reference docs (GitHub source)
|
│ ├── avap_language_github_docs/ # AVAP language reference docs (GitHub source)
|
||||||
│ ├── developer.avapframework.com/ # AVAP developer portal docs
|
│ ├── developer.avapframework.com/ # AVAP developer portal docs
|
||||||
│ ├── LRM/
|
│ ├── LRM/
|
||||||
│ │ └── avap.md # AVAP Language Reference Manual (LRM)
|
│ │ └── avap.md # AVAP Language Reference Manual (LRM)
|
||||||
│ └── samples/ # AVAP code samples (.avap) used for ingestion
|
│ └── samples/ # AVAP code samples (.avap) used for ingestion
|
||||||
│
|
│
|
||||||
|
├── LICENSE # Proprietary license — 101OBEX Corp, Delaware
|
||||||
|
│
|
||||||
|
├── research/ # Experiment results, benchmarks, datasets (MrHouston)
|
||||||
|
│ └── embeddings/ # Embedding model benchmark results (BEIR)
|
||||||
|
│
|
||||||
├── ingestion/
|
├── ingestion/
|
||||||
│ └── chunks.json # Last export of ingested chunks (ES bulk output)
|
│ └── chunks.json # Last export of ingested chunks (ES bulk output)
|
||||||
│
|
│
|
||||||
|
|
@ -396,7 +407,7 @@ Returns the full answer as a single message with `is_final: true`. Suitable for
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
grpcurl -plaintext \
|
grpcurl -plaintext \
|
||||||
-d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
|
-d '{"query": "Que significa AVAP?", "session_id": "dev-001"}' \
|
||||||
localhost:50052 \
|
localhost:50052 \
|
||||||
brunix.AssistanceEngine/AskAgent
|
brunix.AssistanceEngine/AskAgent
|
||||||
```
|
```
|
||||||
|
|
@ -404,7 +415,7 @@ grpcurl -plaintext \
|
||||||
Expected response:
|
Expected response:
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"text": "addVar is an AVAP command used to declare a variable...",
|
"text": "AVAP (Advanced Virtual API Programming) es un DSL Turing Completo...",
|
||||||
"avap_code": "AVAP-2026",
|
"avap_code": "AVAP-2026",
|
||||||
"is_final": true
|
"is_final": true
|
||||||
}
|
}
|
||||||
|
|
@ -493,17 +504,33 @@ This enables integration with any tool that supports the OpenAI or Ollama API (c
|
||||||
| `POST` | `/v1/completions` | Legacy text completion — streaming and non-streaming |
|
| `POST` | `/v1/completions` | Legacy text completion — streaming and non-streaming |
|
||||||
| `GET` | `/health` | Health check — returns gRPC target and status |
|
| `GET` | `/health` | Health check — returns gRPC target and status |
|
||||||
|
|
||||||
**Non-streaming chat:**
|
**Non-streaming chat — general query:**
|
||||||
```bash
|
```bash
|
||||||
curl http://localhost:8000/v1/chat/completions \
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{
|
-d '{
|
||||||
"model": "brunix",
|
"model": "brunix",
|
||||||
"messages": [{"role": "user", "content": "What is AVAP?"}],
|
"messages": [{"role": "user", "content": "Que significa AVAP?"}],
|
||||||
"stream": false
|
"stream": false,
|
||||||
|
"session_id": "dev-001"
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Non-streaming chat — with editor context (VS Code extension):**
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"model": "brunix",
|
||||||
|
"messages": [{"role": "user", "content": "que hace este codigo?"}],
|
||||||
|
"stream": false,
|
||||||
|
"session_id": "dev-001",
|
||||||
|
"user": "{\"editor_content\":\"\",\"selected_text\":\"<base64>\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Editor context transport:** The `user` field carries editor context as a JSON string. `editor_content`, `selected_text`, and `extra_context` must be Base64-encoded. `user_info` is a JSON object with `dev_id`, `project_id`, and `org_id`. The engine only injects editor context into the response when the classifier detects the user is explicitly referring to their code. See [`docs/API_REFERENCE.md`](./docs/API_REFERENCE.md#6-openai-compatible-proxy) for full details.
|
||||||
|
|
||||||
**Streaming chat (SSE):**
|
**Streaming chat (SSE):**
|
||||||
```bash
|
```bash
|
||||||
curl http://localhost:8000/v1/chat/completions \
|
curl http://localhost:8000/v1/chat/completions \
|
||||||
|
|
@ -635,7 +662,9 @@ For the full set of contribution standards, see [CONTRIBUTING.md](./CONTRIBUTING
|
||||||
| [docs/API_REFERENCE.md](./docs/API_REFERENCE.md) | Complete gRPC API contract, message types, client examples |
|
| [docs/API_REFERENCE.md](./docs/API_REFERENCE.md) | Complete gRPC API contract, message types, client examples |
|
||||||
| [docs/RUNBOOK.md](./docs/RUNBOOK.md) | Operational playbooks, health checks, incident response |
|
| [docs/RUNBOOK.md](./docs/RUNBOOK.md) | Operational playbooks, health checks, incident response |
|
||||||
| [docs/AVAP_CHUNKER_CONFIG.md](./docs/AVAP_CHUNKER_CONFIG.md) | `avap_config.json` reference — blocks, statements, semantic tags, how to extend |
|
| [docs/AVAP_CHUNKER_CONFIG.md](./docs/AVAP_CHUNKER_CONFIG.md) | `avap_config.json` reference — blocks, statements, semantic tags, how to extend |
|
||||||
| [docs/adr/](./docs/adr/) | Architecture Decision Records |
|
| [docs/ADR/](./docs/ADR/) | Architecture Decision Records |
|
||||||
|
| [docs/product/](./docs/product/) | Product Requirements Documents |
|
||||||
|
| [research/](./research/) | Experiment results, benchmarks, and datasets |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
70
changelog
70
changelog
|
|
@ -4,6 +4,76 @@ All notable changes to the **Brunix Assistance Engine** will be documented in th
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## [1.6.1] - 2026-03-20
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- FEATURE (PRD-0002): Extended `AgentRequest` in `brunix.proto` with four optional fields: `editor_content` (field 3), `selected_text` (field 4), `extra_context` (field 5), `user_info` (field 6) — enabling the VS Code extension to send active file content, selected code, free-form context, and client identity metadata alongside every query. Fields 3–5 are Base64-encoded; field 6 is a JSON string.
|
||||||
|
- FEATURE (PRD-0002): Extended `AgentState` with `editor_content`, `selected_text`, `extra_context`, `user_info`, and `use_editor_context` fields.
|
||||||
|
- FEATURE (PRD-0002): Extended classifier (`CLASSIFY_PROMPT_TEMPLATE`) to output two tokens — query type and editor signal (`EDITOR` / `NO_EDITOR`). `use_editor_context` flag set in state based on classifier output.
|
||||||
|
- FEATURE (PRD-0002): Editor context injected into generation prompt only when `use_editor_context=True` — prevents the model from referencing editor code when the question is unrelated.
|
||||||
|
- FEATURE (PRD-0002): `openai_proxy.py` — parses the standard OpenAI `user` field as a JSON string to extract `editor_content`, `selected_text`, `extra_context`, and `user_info`. Non-Brunix clients that send `user` as a plain string or omit it are handled gracefully with no error.
|
||||||
|
- FEATURE (PRD-0002): `server.py` — Base64 decoding of `editor_content`, `selected_text`, and `extra_context` on request arrival. Malformed Base64 is silently treated as empty string.
|
||||||
|
- TESTS: Added `Docker/tests/test_prd_0002.py` — 40 unit tests covering `_parse_query_type`, `_decode_b64`, `_parse_editor_context`, `_build_reformulate_query`, editor context injection logic, and all valid classifier output combinations. Runs without external dependencies (no Elasticsearch, no Ollama, no gRPC server required).
|
||||||
|
- DOCS: Added `docs/product/PRD-0001-openai-compatible-proxy.md` — product requirements document for the OpenAI-compatible HTTP proxy.
|
||||||
|
- DOCS: Added `docs/product/PRD-0002-editor-context-injection.md` — product requirements document for editor context injection (updated to Implemented status with full technical design).
|
||||||
|
- DOCS: Added `docs/ADR/ADR-0005-embedding-model-selection.md` — comparative evaluation of BGE-M3 vs Qwen3-Embedding-0.6B. Status: Under Evaluation.
|
||||||
|
- DOCS: Added `LICENSE` — proprietary license, 101OBEX, Corp, Delaware.
|
||||||
|
- DOCS: Added `research/` directory structure for MrHouston experiment results and benchmarks.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- FEATURE (PRD-0002): `session_id` in `openai_proxy.py` is now read exclusively from the dedicated `session_id` field — no longer falls back to the `user` field. Breaking change for any client that was using `user` as a `session_id` fallback.
|
||||||
|
- ENGINE: `CLASSIFY_PROMPT_TEMPLATE` extended with `<editor_rule>` and updated `<output_rule>` for two-token output format.
|
||||||
|
- ENGINE: `REFORMULATE_PROMPT` extended with `<mode_rule>` and `<language_rule>` — the reformulator now receives `[MODE: X]` prepended to the query and applies command expansion only in `CODE_GENERATION` mode.
|
||||||
|
- ENGINE: `GENERATE_PROMPT` — added "Answer in the same language the user used" to `<output_format>`. Fixes responses defaulting to English for Spanish queries.
|
||||||
|
- ENGINE: `hybrid_search_native` in `graph.py` — BM25 query now uses a `bool` query with `should` boost for `doc_type: spec` and `doc_type: narrative` chunks, improving retrieval of definitional and explanatory content over raw code examples.
|
||||||
|
- DOCS: Updated `docs/API_REFERENCE.md` — full `AgentRequest` table with all 6 fields, Base64 encoding notes, editor context behaviour section, and updated proxy examples.
|
||||||
|
- DOCS: Updated `docs/ARCHITECTURE.md` — new §6 Editor Context Pipeline, updated §4 LangGraph Workflow with two-token classifier, §4.6 reformulator mode-aware and language-preserving, updated component inventory and request lifecycle diagrams.
|
||||||
|
- DOCS: Updated `README.md` — project structure with `Docker/tests/`, `docs/product/`, `docs/ADR/ADR-0005`, `research/`, `LICENSE`. HTTP proxy section updated with editor context curl examples. Documentation index updated.
|
||||||
|
- DOCS: Updated `CONTRIBUTING.md` — added Section 10 (PRDs), Section 11 (Research & Experiments Policy), updated PR checklist, ADR table with ADR-0005.
|
||||||
|
- DOCS: Updated `docs/AVAP_CHUNKER_CONFIG.md` to v2.0 — five new commands (else, end, endLoop, exception, return), naming fix (AddvariableToJSON), nine dual assignment patterns, four new semantic tags.
|
||||||
|
- GOVERNANCE: Updated `.github/CODEOWNERS` — added `@BRUNIX-AI/engineering` and `@BRUNIX-AI/research` teams, explicit rules for proto, golden dataset, grammar config, ADRs and PRDs.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ENGINE: Fixed retrieval returning wrong chunks for Spanish definition queries — reformulator was translating Spanish queries to English, breaking BM25 lexical matching against Spanish LRM chunks. Root cause: missing language preservation rule in `REFORMULATE_PROMPT`.
|
||||||
|
- ENGINE: Fixed reformulator applying CODE_GENERATION command expansion to RETRIEVAL queries — caused "Que significa AVAP?" to reformulate as "AVAP registerEndpoint addResult _status". Root cause: reformulator had no awareness of query type. Fix: `[MODE: X]` prefix + mode-aware rules.
|
||||||
|
- ENGINE: Fixed responses defaulting to English regardless of query language. Root cause: `GENERATE_PROMPT` had no language instruction (unlike `CODE_GENERATION_PROMPT` which already had it).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [1.6.0] - 2026-03-18
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- ENGINE: Added `AskAgentStream` RPC — real token-by-token streaming directly from Ollama. Two-phase design: classify + reformulate + retrieve runs first via `build_prepare_graph`, then `llm.stream()` forwards tokens to the client as they arrive.
|
||||||
|
- ENGINE: Added `EvaluateRAG` RPC — RAGAS evaluation pipeline with Claude as judge. Runs faithfulness, answer_relevancy, context_recall and context_precision against a golden dataset and returns a global score with verdict (EXCELLENT / ACCEPTABLE / INSUFFICIENT).
|
||||||
|
- ENGINE: Added `openai_proxy.py` — OpenAI and Ollama compatible HTTP API running on port 8000. Routes `stream: false` to `AskAgent` and `stream: true` to `AskAgentStream`. Endpoints: `POST /v1/chat/completions`, `POST /v1/completions`, `GET /v1/models`, `POST /api/chat`, `POST /api/generate`, `GET /api/tags`, `GET /health`.
|
||||||
|
- ENGINE: Added `entrypoint.sh` — starts gRPC server and HTTP proxy as parallel processes with mutual watchdog. If either crashes, the container stops cleanly.
|
||||||
|
- ENGINE: Added session memory — `session_store` dict indexed by `session_id` accumulates full conversation history per session. Each request loads and persists history.
|
||||||
|
- ENGINE: Added query intent classifier — LangGraph node that classifies every query as `RETRIEVAL`, `CODE_GENERATION` or `CONVERSATIONAL` and routes to the appropriate subgraph.
|
||||||
|
- ENGINE: Added hybrid retrieval — replaced `ElasticsearchStore` (LangChain abstraction) with native Elasticsearch client. Each query runs BM25 `multi_match` and kNN in parallel, fused with Reciprocal Rank Fusion (k=60). Returns top-8 documents.
|
||||||
|
- ENGINE: Added `evaluate.py` — full RAGAS evaluation pipeline using the same hybrid retrieval as production, Claude as external judge, and the golden dataset in `Docker/src/golden_dataset.json`.
|
||||||
|
- PROTO: Added `AskAgentStream` and `EvaluateRAG` RPCs to `brunix.proto` with their message types (`EvalRequest`, `EvalResponse`, `QuestionDetail`).
|
||||||
|
- DOCS: Added `docs/ADR/ADR-0001-grpc-primary-interface.md`.
|
||||||
|
- DOCS: Added `docs/ADR/ADR-0002-two-phase-streaming.md`.
|
||||||
|
- DOCS: Added `docs/ADR/ADR-0003-hybrid-retrieval-rrf.md`.
|
||||||
|
- DOCS: Added `docs/ADR/ADR-0004-claude-eval-judge.md`.
|
||||||
|
- DOCS: Added `docs/samples/` — 30 representative `.avap` code samples covering all AVAP constructs.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- ENGINE: Replaced `ElasticsearchStore` with native Elasticsearch client — fixes silent kNN failure caused by schema incompatibility between the Chonkie ingestion pipeline and the LangChain-managed index schema.
|
||||||
|
- ENGINE: Replaced single `GENERATE_PROMPT` with five specialised prompts — `CLASSIFY_PROMPT`, `REFORMULATE_PROMPT`, `GENERATE_PROMPT`, `CODE_GENERATION_PROMPT`, `CONVERSATIONAL_PROMPT` — each optimised for its routing path.
|
||||||
|
- ENGINE: Extended `REFORMULATE_PROMPT` with explicit AVAP command mapping — intent-to-command expansion for API, database, HTTP, loop and error handling query types.
|
||||||
|
- ENGINE: Extended `AgentState` with `query_type` and `session_id` fields required for conditional routing and session persistence.
|
||||||
|
- ENGINE: Fixed `session_id` ignored — `graph.invoke` now passes `session_id` into the graph state.
|
||||||
|
- ENGINE: Fixed double `is_final: True` — `AskAgent` previously emitted two closing messages. Now emits exactly one.
|
||||||
|
- ENGINE: Fixed embedding endpoint mismatch — server now uses the same `/api/embed` endpoint and payload format as both ingestion pipelines, ensuring vectors are comparable at query time.
|
||||||
|
- DEPENDENCIES: `requirements.txt` updated — added `ragas`, `datasets`, `langchain-anthropic`, `fastapi`, `uvicorn`.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- ENGINE: Fixed retrieval returning zero results — `ElasticsearchStore` assumed a LangChain-managed schema incompatible with the Chonkie-generated index. Replaced with native ES client querying actual field names.
|
||||||
|
- ENGINE: Fixed context always empty — consequence of the retrieval bug above. The generation prompt received an empty `{context}` on every request and always returned the fallback string.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## [1.5.1] - 2026-03-18
|
## [1.5.1] - 2026-03-18
|
||||||
|
|
||||||
### Added
|
### Added
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,185 @@
|
||||||
|
# ADR-0005: Embedding Model Selection — Comparative Evaluation of BGE-M3 vs Qwen3-Embedding-0.6B
|
||||||
|
|
||||||
|
**Date:** 2026-03-19
|
||||||
|
**Status:** Under Evaluation
|
||||||
|
**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The AVAP RAG pipeline requires an embedding model capable of mapping a hybrid corpus into a vector space suitable for semantic retrieval. Understanding the exact composition of this corpus is a prerequisite for model selection.
|
||||||
|
|
||||||
|
### Corpus characterisation (empirically measured)
|
||||||
|
|
||||||
|
A chunk-level audit was performed on the full indexable corpus: the AVAP Language Reference Manual (`avap.md`) and 40 representative `.avap` code samples. Results (`test_chunks.jsonl`, 190 chunks):
|
||||||
|
|
||||||
|
| Metric | Value |
|
||||||
|
|---|---|
|
||||||
|
| Total chunks | 190 |
|
||||||
|
| Total tokens indexed | 11,498 |
|
||||||
|
| Minimum chunk size | 1 token |
|
||||||
|
| Maximum chunk size | 833 tokens |
|
||||||
|
| Mean chunk size | 60.5 tokens |
|
||||||
|
| Median chunk size | 29 tokens |
|
||||||
|
| p90 | 117 tokens |
|
||||||
|
| p95 | 204 tokens |
|
||||||
|
| p99 | 511 tokens |
|
||||||
|
|
||||||
|
**Corpus composition by type:**
|
||||||
|
|
||||||
|
| Type | Count | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| Narrative (Spanish prose) | 79 | LRM explanations, concept descriptions |
|
||||||
|
| Code chunks | 83 | AVAP `.avap` sample files |
|
||||||
|
| BNF formal grammar | 9 | Formal language specification in English |
|
||||||
|
| Code examples | 14 | Inline examples within LRM |
|
||||||
|
| Function signatures | 2 | Extracted function headers |
|
||||||
|
|
||||||
|
**Linguistic composition:** 55% of chunks originate from the LRM (`avap.md`), written in Spanish with embedded English DSL identifiers. 45% are `.avap` code files containing English command names (`addVar`, `addResult`, `registerEndpoint`, `ormDirect`) with Spanish-language string literals and variable names (`"Hola"`, `datos_cliente`, `mi_json_final`, `contraseña`, `fecha`). 18.9% of chunks (36 out of 190) contain both Spanish content and English DSL commands within the same chunk — intra-chunk multilingual mixing.
|
||||||
|
|
||||||
|
Representative examples of intra-chunk multilingual mixing:
|
||||||
|
|
||||||
|
```
|
||||||
|
// Narrative chunk (Spanish prose + English DSL terms):
|
||||||
|
"AVAP (Advanced Virtual API Programming) es un DSL (Domain-Specific Language)
|
||||||
|
Turing Completo, diseñado para la orquestación segura de microservicios e I/O."
|
||||||
|
|
||||||
|
// Code chunk (English commands + Spanish identifiers and literals):
|
||||||
|
addParam("lang", l)
|
||||||
|
if(l, "es", "=")
|
||||||
|
addVar(msg, "Hola")
|
||||||
|
end()
|
||||||
|
addResult(msg)
|
||||||
|
|
||||||
|
// BNF chunk (formal English grammar):
|
||||||
|
<program> ::= ( <line> | <block_comment> )*
|
||||||
|
<statement> ::= <assignment> | <method_call_stmt> | <io_command> | ...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Why the initial model was eliminated
|
||||||
|
|
||||||
|
The initial model provided was **Qwen2.5-1.5B**. Empirical evaluation by MrHouston Engineering (full results in `research/embeddings/`) demonstrated it is unsuitable for dense retrieval. Qwen2.5-1.5B generates embeddings via the **Last Token** method: the final token of the sequence is assumed to encode all preceding context. For AVAP code chunks, the last token is always a syntactic closer — `end()`, `}`, `endLoop()` — with zero semantic content. The resulting embeddings are effectively identical across functionally distinct chunks.
|
||||||
|
|
||||||
|
Benchmark confirmation (BEIR evaluation, three datasets):
|
||||||
|
|
||||||
|
**CodeXGLUE** (code retrieval from GitHub repositories):
|
||||||
|
|
||||||
|
| k | Qwen2.5-1.5B NDCG | Qwen2.5-1.5B Recall | Qwen3-Emb-0.6B NDCG | Qwen3-Emb-0.6B Recall |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| 1 | 0.00031 | 0.00031 | **0.9497** | **0.9497** |
|
||||||
|
| 5 | 0.00086 | 0.00151 | **0.9716** | **0.9876** |
|
||||||
|
| 10 | 0.00118 | 0.00250 | **0.9734** | **0.9929** |
|
||||||
|
|
||||||
|
**CoSQA** (natural language queries over code — closest proxy to AVAP retrieval):
|
||||||
|
|
||||||
|
| k | Qwen2.5-1.5B NDCG | Qwen2.5-1.5B Recall | Qwen3-Emb-0.6B NDCG | Qwen3-Emb-0.6B Recall |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| 1 | 0.00000 | 0.00000 | **0.1740** | **0.1740** |
|
||||||
|
| 10 | 0.00000 | 0.00000 | **0.3909** | **0.6700** |
|
||||||
|
| 100 | 0.00210 | 0.01000 | **0.4510** | **0.9520** |
|
||||||
|
|
||||||
|
**SciFact** (scientific prose — out-of-domain control):
|
||||||
|
|
||||||
|
| k | Qwen2.5-1.5B NDCG | Qwen2.5-1.5B Recall | Qwen3-Emb-0.6B NDCG | Qwen3-Emb-0.6B Recall |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| 1 | 0.02333 | 0.02083 | **0.5633** | **0.5299** |
|
||||||
|
| 10 | 0.04619 | 0.07417 | **0.6855** | **0.8161** |
|
||||||
|
| 100 | 0.07768 | 0.23144 | **0.7129** | **0.9400** |
|
||||||
|
|
||||||
|
Qwen2.5-1.5B is eliminated. **Qwen3-Embedding-0.6B is the validated baseline.**
|
||||||
|
|
||||||
|
### Why a comparative evaluation is required before adopting Qwen3
|
||||||
|
|
||||||
|
Qwen3-Embedding-0.6B's benchmark results were obtained on English-only datasets. They eliminate Qwen2.5-1.5B decisively but do not characterise Qwen3's behaviour on the multilingual mixed corpus that AVAP represents. A second candidate — **BGE-M3** — presents theoretical advantages for this specific corpus that cannot be assessed without empirical comparison.
|
||||||
|
|
||||||
|
The index rebuild required to adopt any model is destructive and must be done once. Given that the embedding model directly determines the quality of all RAG retrieval in production, adopting a model without a direct comparison between the two viable candidates would not meet the due diligence required for a decision of this impact.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Conduct a **head-to-head comparative evaluation** of BGE-M3 and Qwen3-Embedding-0.6B under identical conditions before adopting either as the production embedding model.
|
||||||
|
|
||||||
|
The model that demonstrates superior performance under the evaluation criteria defined below will be adopted. This ADR moves to Accepted upon completion of that evaluation, with the selected model documented as the outcome.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Candidate Analysis
|
||||||
|
|
||||||
|
### Qwen3-Embedding-0.6B
|
||||||
|
|
||||||
|
**Strengths:**
|
||||||
|
- Already benchmarked on CodeXGLUE, CoSQA and SciFact — strong results documented
|
||||||
|
- 32,768 token context window — exceeds corpus requirements with large margin
|
||||||
|
- Same model family as the generation model (Qwen) — shared tokenizer vocabulary
|
||||||
|
- Lowest integration risk — already validated in the pipeline
|
||||||
|
|
||||||
|
**Limitations:**
|
||||||
|
- Benchmarks are English-only — multilingual performance on AVAP corpus unvalidated
|
||||||
|
- Not a dedicated multilingual model — training distribution weighted towards English and Chinese
|
||||||
|
- No native sparse retrieval support
|
||||||
|
|
||||||
|
**Corpus fit assessment:** The maximum chunk in the AVAP corpus is 833 tokens — well within both candidates' limits. Qwen3's 32,768 token context window provides no practical advantage over BGE-M3's 8,192 tokens for this corpus. Context window is not a differentiating criterion.
|
||||||
|
|
||||||
|
### BGE-M3
|
||||||
|
|
||||||
|
**Strengths:**
|
||||||
|
- Explicit multilingual contrastive training across 100+ languages including programming languages — direct architectural fit for the intra-chunk Spanish/English/DSL mixing observed in the corpus
|
||||||
|
- Supports dense, sparse and multi-vector ColBERT retrieval from a single model inference — future path to consolidating the current BM25+kNN dual-system architecture (ADR-0003)
|
||||||
|
- Higher MTEB retrieval score than Qwen3-Embedding-0.6B in the programming domain
|
||||||
|
|
||||||
|
**Limitations:**
|
||||||
|
- Not yet benchmarked on CodeXGLUE, CoSQA or SciFact — no empirical results for this corpus
|
||||||
|
- 8,192 token context window — sufficient for current corpus (max chunk: 833 tokens, 10.2% utilization) but lower headroom for future corpus growth
|
||||||
|
- Requires tokenizer alignment: `HF_EMB_MODEL_NAME` must be updated to `BAAI/bge-m3` alongside `OLLAMA_EMB_MODEL_NAME` to keep chunk token counting consistent
|
||||||
|
|
||||||
|
**Corpus fit assessment:** The intra-chunk multilingual mixing (18.9% of chunks) and the Spanish prose component (79 narrative chunks) are the corpus characteristics most likely to differentiate BGE-M3 from Qwen3. The BEIR and EvaluateRAG evaluations will determine whether this theoretical advantage translates to measurable retrieval improvement.
|
||||||
|
|
||||||
|
### VRAM
|
||||||
|
|
||||||
|
Both candidates require approximately 1.13 GiB at FP16 (BGE-M3: 567M parameters; Qwen3: 596M parameters). Combined with a quantized generation model and KV cache, total VRAM remains within the 4 GiB hardware constraint for both. VRAM is not a selection criterion.
|
||||||
|
|
||||||
|
### Embedding dimension
|
||||||
|
|
||||||
|
Both candidates output 1024-dimensional vectors. The Elasticsearch index mapping (`int8_hnsw`, `dims: 1024`, cosine similarity) is identical for both candidates. No mapping changes are required between them.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Evaluation Protocol
|
||||||
|
|
||||||
|
Both models are evaluated under identical conditions. Results must be documented in `research/embeddings/` before this ADR is closed.
|
||||||
|
|
||||||
|
**Step 1 — BEIR benchmarks**
|
||||||
|
|
||||||
|
Run CodeXGLUE, CoSQA and SciFact with **BGE-M3** using the same BEIR evaluation scripts and configuration used for Qwen3-Embedding-0.6B. Qwen3-Embedding-0.6B results already exist in `research/embeddings/` and serve as the baseline. Report NDCG@k, MAP@k, Recall@k and Precision@k at k = 1, 3, 5, 10, 100.
|
||||||
|
|
||||||
|
**Step 2 — EvaluateRAG on AVAP corpus**
|
||||||
|
|
||||||
|
Rebuild the Elasticsearch index twice — once with each model — and run `EvaluateRAG` against the production AVAP golden dataset for both. Report RAGAS scores: faithfulness, answer_relevancy, context_recall, context_precision, and global score with verdict.
|
||||||
|
|
||||||
|
**Selection criterion**
|
||||||
|
|
||||||
|
EvaluateRAG is the primary decision signal. It directly measures retrieval quality on the actual AVAP production corpus — including its intra-chunk multilingual mixing (18.9% of chunks) and domain-specific DSL syntax — and is therefore more representative than any external benchmark. The model with the higher global EvaluateRAG score is adopted.
|
||||||
|
|
||||||
|
BEIR results are the secondary signal. The primary BEIR metric is NDCG@10. Among the three datasets, **CoSQA is the most representative proxy** for the AVAP retrieval use case — it pairs natural language queries with code snippets, mirroring the Spanish prose query / AVAP DSL code retrieval pattern. CoSQA results are weighted accordingly in the comparison.
|
||||||
|
|
||||||
|
All margin comparisons use **absolute percentage points** in NDCG@10 (e.g., 0.39 vs 0.41 is a 2 absolute percentage point difference, not a 5.1% relative difference).
|
||||||
|
|
||||||
|
**Tiebreaker**
|
||||||
|
|
||||||
|
If the EvaluateRAG global scores are within 5 absolute percentage points of each other, the BEIR results determine the outcome under the following conditions:
|
||||||
|
|
||||||
|
- BGE-M3 must exceed Qwen3-Embedding-0.6B by more than 2 absolute percentage points on mean NDCG@10 across all three BEIR datasets, AND
|
||||||
|
- BGE-M3 must not underperform Qwen3-Embedding-0.6B by more than 2 absolute percentage points on CoSQA NDCG@10 specifically.
|
||||||
|
|
||||||
|
If neither condition is met — that is, if EvaluateRAG scores are within 5 points and BGE-M3 does not clear both BEIR thresholds — Qwen3-Embedding-0.6B is adopted. It carries lower integration risk, its benchmarks are already documented, and it is the validated baseline for the system.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- **Index rebuild required** regardless of which model is adopted. Vectors from Qwen2.5-1.5B are incompatible with either candidate. The existing index must be deleted before re-ingestion.
|
||||||
|
- **Two index rebuilds required for the evaluation.** One per candidate for the EvaluateRAG step. Given the current corpus size (190 chunks, 11,498 tokens), rebuild time is not a meaningful constraint.
|
||||||
|
- **Tokenizer alignment for BGE-M3.** If BGE-M3 is selected, both `OLLAMA_EMB_MODEL_NAME` and `HF_EMB_MODEL_NAME` must be updated. Updating only `OLLAMA_EMB_MODEL_NAME` causes the chunker to estimate token counts using the wrong vocabulary — a silent bug that produces inconsistent chunk sizes without raising any error.
|
||||||
|
- **Future model changes.** Any future replacement of the embedding model must follow the same evaluation protocol — BEIR benchmarks on the same three datasets plus EvaluateRAG — before an ADR update is accepted. Results must be documented in `research/embeddings/`.
|
||||||
|
|
@ -45,16 +45,7 @@ Both `AskAgent` and `AskAgentStream` return a **server-side stream** of `AgentRe
|
||||||
|
|
||||||
**Use case:** Clients that do not support streaming or need a single atomic response.
|
**Use case:** Clients that do not support streaming or need a single atomic response.
|
||||||
|
|
||||||
**Request:**
|
**Request:** See [`AgentRequest`](#agentrequest) in §3.
|
||||||
|
|
||||||
```protobuf
|
|
||||||
message AgentRequest {
|
|
||||||
string query = 1; // The user's question. Required. Max recommended: 4096 chars.
|
|
||||||
string session_id = 2; // Conversation session identifier. Optional.
|
|
||||||
// If empty, defaults to "default" (shared session).
|
|
||||||
// Use a UUID per user/conversation for isolation.
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response stream:**
|
**Response stream:**
|
||||||
|
|
||||||
|
|
@ -70,7 +61,7 @@ message AgentRequest {
|
||||||
|
|
||||||
**Behaviour:** Runs `prepare_graph` (classify → reformulate → retrieve), then calls `llm.stream()` directly. Emits one `AgentResponse` per token from Ollama, followed by a terminal message.
|
**Behaviour:** Runs `prepare_graph` (classify → reformulate → retrieve), then calls `llm.stream()` directly. Emits one `AgentResponse` per token from Ollama, followed by a terminal message.
|
||||||
|
|
||||||
**Use case:** Interactive clients (chat UIs, terminal tools) that need progressive rendering.
|
**Use case:** Interactive clients (chat UIs, VS Code extension) that need progressive rendering.
|
||||||
|
|
||||||
**Request:** Same `AgentRequest` as `AskAgent`.
|
**Request:** Same `AgentRequest` as `AskAgent`.
|
||||||
|
|
||||||
|
|
@ -152,10 +143,40 @@ message QuestionDetail {
|
||||||
|
|
||||||
### `AgentRequest`
|
### `AgentRequest`
|
||||||
|
|
||||||
| Field | Type | Required | Description |
|
```protobuf
|
||||||
|---|---|---|---|
|
message AgentRequest {
|
||||||
| `query` | `string` | Yes | User's natural language question |
|
string query = 1;
|
||||||
| `session_id` | `string` | No | Conversation identifier for multi-turn context. Use a stable UUID per user session. |
|
string session_id = 2;
|
||||||
|
string editor_content = 3;
|
||||||
|
string selected_text = 4;
|
||||||
|
string extra_context = 5;
|
||||||
|
string user_info = 6;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
| Field | Type | Required | Encoding | Description |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| `query` | `string` | Yes | Plain text | User's natural language question. Max recommended: 4096 chars. |
|
||||||
|
| `session_id` | `string` | No | Plain text | Conversation identifier for multi-turn context. Use a stable UUID per user session. Defaults to `"default"` if empty. |
|
||||||
|
| `editor_content` | `string` | No | Base64 | Full content of the active file open in the editor at query time. Decoded server-side before entering the graph. |
|
||||||
|
| `selected_text` | `string` | No | Base64 | Text currently selected in the editor. Primary anchor for query reformulation and generation when the classifier detects an explicit editor reference. |
|
||||||
|
| `extra_context` | `string` | No | Base64 | Free-form additional context (e.g. file path, language identifier, open diagnostic errors). |
|
||||||
|
| `user_info` | `string` | No | JSON string | Client identity metadata. Expected format: `{"dev_id": <int>, "project_id": <int>, "org_id": <int>}`. Available in graph state for future routing or personalisation — not yet consumed by the graph. |
|
||||||
|
|
||||||
|
**Editor context behaviour:**
|
||||||
|
|
||||||
|
Fields 3–6 are all optional. If none are provided the assistant behaves exactly as without them — full backward compatibility. When `editor_content` or `selected_text` are provided, the graph classifier determines whether the user's question explicitly refers to that code. Only if the classifier returns `EDITOR` are the context fields injected into the generation prompt. This prevents the model from referencing editor code when the question is unrelated to it.
|
||||||
|
|
||||||
|
**Base64 encoding:**
|
||||||
|
|
||||||
|
`editor_content`, `selected_text` and `extra_context` must be Base64-encoded before sending. The server decodes them with UTF-8. Malformed Base64 is silently treated as empty string — no error is raised.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import base64
|
||||||
|
encoded = base64.b64encode(content.encode("utf-8")).decode("utf-8")
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
### `AgentResponse`
|
### `AgentResponse`
|
||||||
|
|
||||||
|
|
@ -165,6 +186,8 @@ message QuestionDetail {
|
||||||
| `avap_code` | `string` | Currently always `"AVAP-2026"` in non-streaming mode, empty in streaming |
|
| `avap_code` | `string` | Currently always `"AVAP-2026"` in non-streaming mode, empty in streaming |
|
||||||
| `is_final` | `bool` | `true` only on the last message of the stream |
|
| `is_final` | `bool` | `true` only on the last message of the stream |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
### `EvalRequest`
|
### `EvalRequest`
|
||||||
|
|
||||||
| Field | Type | Required | Default | Description |
|
| Field | Type | Required | Default | Description |
|
||||||
|
|
@ -173,6 +196,8 @@ message QuestionDetail {
|
||||||
| `limit` | `int32` | No | `0` (all) | Max questions to evaluate |
|
| `limit` | `int32` | No | `0` (all) | Max questions to evaluate |
|
||||||
| `index` | `string` | No | `$ELASTICSEARCH_INDEX` | ES index to evaluate |
|
| `index` | `string` | No | `$ELASTICSEARCH_INDEX` | ES index to evaluate |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
### `EvalResponse`
|
### `EvalResponse`
|
||||||
|
|
||||||
See full definition in [§2.3](#23-evaluaterag).
|
See full definition in [§2.3](#23-evaluaterag).
|
||||||
|
|
@ -211,11 +236,11 @@ grpcurl -plaintext localhost:50052 list
|
||||||
grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
|
grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
|
||||||
```
|
```
|
||||||
|
|
||||||
### `AskAgent` — full response
|
### `AskAgent` — basic query
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
grpcurl -plaintext \
|
grpcurl -plaintext \
|
||||||
-d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
|
-d '{"query": "Que significa AVAP?", "session_id": "dev-001"}' \
|
||||||
localhost:50052 \
|
localhost:50052 \
|
||||||
brunix.AssistanceEngine/AskAgent
|
brunix.AssistanceEngine/AskAgent
|
||||||
```
|
```
|
||||||
|
|
@ -223,12 +248,47 @@ grpcurl -plaintext \
|
||||||
Expected response:
|
Expected response:
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"text": "addVar is an AVAP command that declares a new variable...",
|
"text": "AVAP (Advanced Virtual API Programming) es un DSL Turing Completo...",
|
||||||
"avap_code": "AVAP-2026",
|
"avap_code": "AVAP-2026",
|
||||||
"is_final": true
|
"is_final": true
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### `AskAgent` — with editor context
|
||||||
|
|
||||||
|
```python
|
||||||
|
import base64, json, grpc
|
||||||
|
import brunix_pb2, brunix_pb2_grpc
|
||||||
|
|
||||||
|
def encode(text: str) -> str:
|
||||||
|
return base64.b64encode(text.encode("utf-8")).decode("utf-8")
|
||||||
|
|
||||||
|
channel = grpc.insecure_channel("localhost:50052")
|
||||||
|
stub = brunix_pb2_grpc.AssistanceEngineStub(channel)
|
||||||
|
|
||||||
|
editor_code = """
|
||||||
|
try()
|
||||||
|
ormDirect("UPDATE users SET active=1", res)
|
||||||
|
exception(e)
|
||||||
|
addVar(_status, 500)
|
||||||
|
addResult("Error")
|
||||||
|
end()
|
||||||
|
"""
|
||||||
|
|
||||||
|
request = brunix_pb2.AgentRequest(
|
||||||
|
query = "why is this not catching the error?",
|
||||||
|
session_id = "dev-001",
|
||||||
|
editor_content = encode(editor_code),
|
||||||
|
selected_text = encode(editor_code), # same block selected
|
||||||
|
extra_context = encode("file: handler.avap"),
|
||||||
|
user_info = json.dumps({"dev_id": 1, "project_id": 2, "org_id": 3}),
|
||||||
|
)
|
||||||
|
|
||||||
|
for response in stub.AskAgent(request):
|
||||||
|
if response.is_final:
|
||||||
|
print(response.text)
|
||||||
|
```
|
||||||
|
|
||||||
### `AskAgentStream` — token streaming
|
### `AskAgentStream` — token streaming
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -250,7 +310,6 @@ Expected response (truncated):
|
||||||
### `EvaluateRAG` — run evaluation
|
### `EvaluateRAG` — run evaluation
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Evaluate first 10 questions from the "core_syntax" category
|
|
||||||
grpcurl -plaintext \
|
grpcurl -plaintext \
|
||||||
-d '{"category": "core_syntax", "limit": 10}' \
|
-d '{"category": "core_syntax", "limit": 10}' \
|
||||||
localhost:50052 \
|
localhost:50052 \
|
||||||
|
|
@ -264,7 +323,7 @@ Expected response:
|
||||||
"questions_evaluated": 10,
|
"questions_evaluated": 10,
|
||||||
"elapsed_seconds": 142.3,
|
"elapsed_seconds": 142.3,
|
||||||
"judge_model": "claude-sonnet-4-20250514",
|
"judge_model": "claude-sonnet-4-20250514",
|
||||||
"index": "avap-docs-test",
|
"index": "avap-knowledge-v1",
|
||||||
"faithfulness": 0.8421,
|
"faithfulness": 0.8421,
|
||||||
"answer_relevancy": 0.7913,
|
"answer_relevancy": 0.7913,
|
||||||
"context_recall": 0.7234,
|
"context_recall": 0.7234,
|
||||||
|
|
@ -275,7 +334,7 @@ Expected response:
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Multi-turn conversation example
|
### Multi-turn conversation
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Turn 1
|
# Turn 1
|
||||||
|
|
@ -283,7 +342,7 @@ grpcurl -plaintext \
|
||||||
-d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
|
-d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
|
||||||
localhost:50052 brunix.AssistanceEngine/AskAgentStream
|
localhost:50052 brunix.AssistanceEngine/AskAgentStream
|
||||||
|
|
||||||
# Turn 2 — the engine has history from Turn 1
|
# Turn 2 — engine has history from Turn 1
|
||||||
grpcurl -plaintext \
|
grpcurl -plaintext \
|
||||||
-d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
|
-d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
|
||||||
localhost:50052 brunix.AssistanceEngine/AskAgentStream
|
localhost:50052 brunix.AssistanceEngine/AskAgentStream
|
||||||
|
|
@ -303,37 +362,90 @@ python -m grpc_tools.protoc \
|
||||||
|
|
||||||
## 6. OpenAI-Compatible Proxy
|
## 6. OpenAI-Compatible Proxy
|
||||||
|
|
||||||
The container also exposes an HTTP server on port `8000` (`openai_proxy.py`) that wraps `AskAgentStream` under an OpenAI-compatible endpoint. This allows integration with any tool that supports the OpenAI Chat Completions API.
|
The container also exposes an HTTP server on port `8000` (`openai_proxy.py`) that wraps the gRPC interface under an OpenAI-compatible API. This allows integration with any tool that supports the OpenAI Chat Completions API — `continue.dev`, LiteLLM, Open WebUI, or any custom client.
|
||||||
|
|
||||||
**Base URL:** `http://localhost:8000`
|
**Base URL:** `http://localhost:8000`
|
||||||
|
|
||||||
|
### Available endpoints
|
||||||
|
|
||||||
|
| Method | Endpoint | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `POST` | `/v1/chat/completions` | OpenAI Chat Completions. Routes to `AskAgent` or `AskAgentStream`. |
|
||||||
|
| `POST` | `/v1/completions` | OpenAI Completions (legacy). |
|
||||||
|
| `GET` | `/v1/models` | Lists available models. Returns `brunix`. |
|
||||||
|
| `POST` | `/api/chat` | Ollama chat format (NDJSON streaming). |
|
||||||
|
| `POST` | `/api/generate` | Ollama generate format (NDJSON streaming). |
|
||||||
|
| `GET` | `/api/tags` | Ollama model list. |
|
||||||
|
| `GET` | `/health` | Health check. Returns `{"status": "ok"}`. |
|
||||||
|
|
||||||
### `POST /v1/chat/completions`
|
### `POST /v1/chat/completions`
|
||||||
|
|
||||||
|
**Routing:** `stream: false` → `AskAgent` (single response). `stream: true` → `AskAgentStream` (SSE token stream).
|
||||||
|
|
||||||
**Request body:**
|
**Request body:**
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "brunix",
|
"model": "brunix",
|
||||||
"messages": [
|
"messages": [
|
||||||
{"role": "user", "content": "What is addVar in AVAP?"}
|
{"role": "user", "content": "Que significa AVAP?"}
|
||||||
],
|
],
|
||||||
"stream": true
|
"stream": false,
|
||||||
|
"session_id": "uuid-per-conversation",
|
||||||
|
"user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
**Notes:**
|
**The `user` field (editor context transport):**
|
||||||
- The `model` field is ignored; the engine always uses the configured `OLLAMA_MODEL_NAME`.
|
|
||||||
- Session management is handled internally by the proxy. Conversation continuity across separate HTTP requests is not guaranteed.
|
|
||||||
- Only `stream: true` is fully supported. Non-streaming mode may be available but is not the primary use case.
|
|
||||||
|
|
||||||
**Example with curl:**
|
The standard OpenAI `user` field is used to transport editor context as a JSON string. This allows the VS Code extension to send context without requiring API changes. Non-Brunix clients can omit `user` or set it to a plain string — both are handled gracefully.
|
||||||
|
|
||||||
|
| Key in `user` JSON | Encoding | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `editor_content` | Base64 | Full content of the active editor file |
|
||||||
|
| `selected_text` | Base64 | Currently selected text in the editor |
|
||||||
|
| `extra_context` | Base64 | Free-form additional context |
|
||||||
|
| `user_info` | JSON object | `{"dev_id": int, "project_id": int, "org_id": int}` |
|
||||||
|
|
||||||
|
**Important:** `session_id` must be sent as a top-level field — never inside the `user` JSON. The proxy reads `session_id` exclusively from the dedicated field.
|
||||||
|
|
||||||
|
**Example — general query (no editor context):**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl http://localhost:8000/v1/chat/completions \
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{
|
-d '{
|
||||||
"model": "brunix",
|
"model": "brunix",
|
||||||
"messages": [{"role": "user", "content": "Explain AVAP loops"}],
|
"messages": [{"role": "user", "content": "Que significa AVAP?"}],
|
||||||
"stream": true
|
"stream": false,
|
||||||
|
"session_id": "test-001"
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example — query with editor context (VS Code extension):**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"model": "brunix",
|
||||||
|
"messages": [{"role": "user", "content": "que hace este codigo?"}],
|
||||||
|
"stream": true,
|
||||||
|
"session_id": "test-001",
|
||||||
|
"user": "{\"editor_content\":\"<base64>\",\"selected_text\":\"<base64>\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example — empty editor context fields:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"model": "brunix",
|
||||||
|
"messages": [{"role": "user", "content": "como funciona addVar?"}],
|
||||||
|
"stream": false,
|
||||||
|
"session_id": "test-002",
|
||||||
|
"user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{}}"
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
|
||||||
|
|
@ -1,8 +1,8 @@
|
||||||
# Brunix Assistance Engine — Architecture Reference
|
# Brunix Assistance Engine — Architecture Reference
|
||||||
|
|
||||||
> **Audience:** Engineers contributing to this repository, architects reviewing the system design, and operators responsible for its deployment.
|
> **Audience:** Engineers contributing to this repository, architects reviewing the system design, and operators responsible for its deployment.
|
||||||
> **Last updated:** 2026-03-18
|
> **Last updated:** 2026-03-20
|
||||||
> **Version:** 1.5.x
|
> **Version:** 1.6.x
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -13,14 +13,15 @@
|
||||||
3. [Request Lifecycle](#3-request-lifecycle)
|
3. [Request Lifecycle](#3-request-lifecycle)
|
||||||
4. [LangGraph Workflow](#4-langgraph-workflow)
|
4. [LangGraph Workflow](#4-langgraph-workflow)
|
||||||
5. [RAG Pipeline — Hybrid Search](#5-rag-pipeline--hybrid-search)
|
5. [RAG Pipeline — Hybrid Search](#5-rag-pipeline--hybrid-search)
|
||||||
6. [Streaming Architecture (AskAgentStream)](#6-streaming-architecture-askagentstream)
|
6. [Editor Context Pipeline](#6-editor-context-pipeline)
|
||||||
7. [Evaluation Pipeline (EvaluateRAG)](#7-evaluation-pipeline-evaluaterag)
|
7. [Streaming Architecture (AskAgentStream)](#7-streaming-architecture-askagentstream)
|
||||||
8. [Data Ingestion Pipeline](#8-data-ingestion-pipeline)
|
8. [Evaluation Pipeline (EvaluateRAG)](#8-evaluation-pipeline-evaluaterag)
|
||||||
9. [Infrastructure Layout](#9-infrastructure-layout)
|
9. [Data Ingestion Pipeline](#9-data-ingestion-pipeline)
|
||||||
10. [Session State & Conversation Memory](#10-session-state--conversation-memory)
|
10. [Infrastructure Layout](#10-infrastructure-layout)
|
||||||
11. [Observability Stack](#11-observability-stack)
|
11. [Session State & Conversation Memory](#11-session-state--conversation-memory)
|
||||||
12. [Security Boundaries](#12-security-boundaries)
|
12. [Observability Stack](#12-observability-stack)
|
||||||
13. [Known Limitations & Future Work](#13-known-limitations--future-work)
|
13. [Security Boundaries](#13-security-boundaries)
|
||||||
|
14. [Known Limitations & Future Work](#14-known-limitations--future-work)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -33,6 +34,7 @@ The **Brunix Assistance Engine** is a stateful, streaming-capable AI service tha
|
||||||
- **Hybrid RAG** (BM25 + kNN with RRF fusion) over an Elasticsearch vector index
|
- **Hybrid RAG** (BM25 + kNN with RRF fusion) over an Elasticsearch vector index
|
||||||
- **Ollama** as the local LLM and embedding backend
|
- **Ollama** as the local LLM and embedding backend
|
||||||
- **RAGAS + Claude** as the automated evaluation judge
|
- **RAGAS + Claude** as the automated evaluation judge
|
||||||
|
- **Editor context injection** — the VS Code extension can send active file content and selected code alongside each query; the engine decides whether to use it based on the user's intent
|
||||||
|
|
||||||
A secondary **OpenAI-compatible HTTP proxy** (port `8000`) is served via FastAPI/Uvicorn, enabling integration with tools that expect the OpenAI API format.
|
A secondary **OpenAI-compatible HTTP proxy** (port `8000`) is served via FastAPI/Uvicorn, enabling integration with tools that expect the OpenAI API format.
|
||||||
|
|
||||||
|
|
@ -40,6 +42,7 @@ A secondary **OpenAI-compatible HTTP proxy** (port `8000`) is served via FastAPI
|
||||||
┌─────────────────────────────────────────────────────────────┐
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
│ External Clients │
|
│ External Clients │
|
||||||
│ grpcurl / App SDK │ OpenAI-compatible client │
|
│ grpcurl / App SDK │ OpenAI-compatible client │
|
||||||
|
│ VS Code extension │ (continue.dev, LiteLLM) │
|
||||||
└────────────┬────────────────┴──────────────┬────────────────┘
|
└────────────┬────────────────┴──────────────┬────────────────┘
|
||||||
│ gRPC :50052 │ HTTP :8000
|
│ gRPC :50052 │ HTTP :8000
|
||||||
▼ ▼
|
▼ ▼
|
||||||
|
|
@ -74,19 +77,19 @@ A secondary **OpenAI-compatible HTTP proxy** (port `8000`) is served via FastAPI
|
||||||
|
|
||||||
| Component | File / Service | Responsibility |
|
| Component | File / Service | Responsibility |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| **gRPC Server** | `Docker/src/server.py` | Entry point. Implements the `AssistanceEngine` servicer. Initializes LLM, embeddings, ES client, and both graphs. |
|
| **gRPC Server** | `Docker/src/server.py` | Entry point. Implements the `AssistanceEngine` servicer. Initializes LLM, embeddings, ES client, and both graphs. Decodes Base64 editor context fields from incoming requests. |
|
||||||
| **Full Graph** | `Docker/src/graph.py` → `build_graph()` | Complete workflow: classify → reformulate → retrieve → generate. Used by `AskAgent` and `EvaluateRAG`. |
|
| **Full Graph** | `Docker/src/graph.py` → `build_graph()` | Complete workflow: classify → reformulate → retrieve → generate. Used by `AskAgent` and `EvaluateRAG`. |
|
||||||
| **Prepare Graph** | `Docker/src/graph.py` → `build_prepare_graph()` | Partial workflow: classify → reformulate → retrieve. Does **not** call the LLM for generation. Used by `AskAgentStream` to enable manual token streaming. |
|
| **Prepare Graph** | `Docker/src/graph.py` → `build_prepare_graph()` | Partial workflow: classify → reformulate → retrieve. Does **not** call the LLM for generation. Used by `AskAgentStream` to enable manual token streaming. |
|
||||||
| **Message Builder** | `Docker/src/graph.py` → `build_final_messages()` | Reconstructs the final prompt list from prepared state for `llm.stream()`. |
|
| **Message Builder** | `Docker/src/graph.py` → `build_final_messages()` | Reconstructs the final prompt list from prepared state for `llm.stream()`. Injects editor context when `use_editor_context` is `True`. |
|
||||||
| **Prompt Library** | `Docker/src/prompts.py` | Centralized definitions for `CLASSIFY`, `REFORMULATE`, `GENERATE`, `CODE_GENERATION`, and `CONVERSATIONAL` prompts. |
|
| **Prompt Library** | `Docker/src/prompts.py` | Centralized definitions for `CLASSIFY`, `REFORMULATE`, `GENERATE`, `CODE_GENERATION`, and `CONVERSATIONAL` prompts. |
|
||||||
| **Agent State** | `Docker/src/state.py` | `AgentState` TypedDict shared across all graph nodes. |
|
| **Agent State** | `Docker/src/state.py` | `AgentState` TypedDict shared across all graph nodes. Includes editor context fields and `use_editor_context` flag. |
|
||||||
| **Evaluation Suite** | `Docker/src/evaluate.py` | RAGAS-based pipeline. Uses the production retriever + Ollama LLM for generation, and Claude as the impartial judge. |
|
| **Evaluation Suite** | `Docker/src/evaluate.py` | RAGAS-based pipeline. Uses the production retriever + Ollama LLM for generation, and Claude as the impartial judge. |
|
||||||
| **OpenAI Proxy** | `Docker/src/openai_proxy.py` | FastAPI application that wraps `AskAgentStream` under an `/v1/chat/completions` endpoint. |
|
| **OpenAI Proxy** | `Docker/src/openai_proxy.py` | FastAPI application that wraps `AskAgent` / `AskAgentStream` under OpenAI and Ollama compatible endpoints. Parses editor context from the `user` field. |
|
||||||
| **LLM Factory** | `Docker/src/utils/llm_factory.py` | Provider-agnostic factory for chat models (Ollama, AWS Bedrock). |
|
| **LLM Factory** | `Docker/src/utils/llm_factory.py` | Provider-agnostic factory for chat models (Ollama, AWS Bedrock). |
|
||||||
| **Embedding Factory** | `Docker/src/utils/emb_factory.py` | Provider-agnostic factory for embedding models (Ollama, HuggingFace). |
|
| **Embedding Factory** | `Docker/src/utils/emb_factory.py` | Provider-agnostic factory for embedding models (Ollama, HuggingFace). |
|
||||||
| **Ingestion Pipeline** | `scripts/pipelines/flows/elasticsearch_ingestion.py` | Chunks and ingests AVAP documents into Elasticsearch with embeddings. |
|
| **Ingestion Pipeline** | `scripts/pipelines/flows/elasticsearch_ingestion.py` | Chunks and ingests AVAP documents into Elasticsearch with embeddings. |
|
||||||
| **Dataset Generator** | `scripts/pipelines/flows/generate_mbap.py` | Generates synthetic MBPP-style AVAP problems using Claude. |
|
| **AVAP Chunker** | `scripts/pipelines/ingestion/avap_chunker.py` | Semantic chunker for `.avap` source files using `avap_config.json` as grammar. |
|
||||||
| **MBPP Translator** | `scripts/pipelines/flows/translate_mbpp.py` | Translates MBPP Python dataset into AVAP equivalents. |
|
| **Unit Tests** | `Docker/tests/test_prd_0002.py` | 40 unit tests covering editor context parsing, Base64 decoding, classifier output, reformulate anchor, and injection logic. |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -95,16 +98,21 @@ A secondary **OpenAI-compatible HTTP proxy** (port `8000`) is served via FastAPI
|
||||||
### 3.1 `AskAgent` (non-streaming)
|
### 3.1 `AskAgent` (non-streaming)
|
||||||
|
|
||||||
```
|
```
|
||||||
Client → gRPC AgentRequest{query, session_id}
|
Client → gRPC AgentRequest{query, session_id, editor_content*, selected_text*, extra_context*, user_info*}
|
||||||
|
│ (* Base64-encoded; user_info is JSON string)
|
||||||
│
|
│
|
||||||
|
├─ Decode Base64 fields (editor_content, selected_text, extra_context)
|
||||||
├─ Load conversation history from session_store[session_id]
|
├─ Load conversation history from session_store[session_id]
|
||||||
├─ Build initial_state = {messages: history + [user_msg], ...}
|
├─ Build initial_state = {messages, session_id, editor_content, selected_text, extra_context, user_info}
|
||||||
│
|
│
|
||||||
└─ graph.invoke(initial_state)
|
└─ graph.invoke(initial_state)
|
||||||
├─ classify → query_type ∈ {RETRIEVAL, CODE_GENERATION, CONVERSATIONAL}
|
├─ classify → query_type ∈ {RETRIEVAL, CODE_GENERATION, CONVERSATIONAL}
|
||||||
├─ reformulate → reformulated_query (keyword-optimized for semantic search)
|
│ use_editor_context ∈ {True, False}
|
||||||
├─ retrieve → context (top-8 hybrid RRF chunks from Elasticsearch)
|
├─ reformulate → reformulated_query
|
||||||
└─ generate → final AIMessage (llm.invoke)
|
│ (anchored to selected_text if use_editor_context=True)
|
||||||
|
├─ retrieve → context (top-8 hybrid RRF chunks)
|
||||||
|
└─ generate → final AIMessage
|
||||||
|
(editor context injected only if use_editor_context=True)
|
||||||
│
|
│
|
||||||
├─ Persist updated history to session_store[session_id]
|
├─ Persist updated history to session_store[session_id]
|
||||||
└─ yield AgentResponse{text, avap_code="AVAP-2026", is_final=True}
|
└─ yield AgentResponse{text, avap_code="AVAP-2026", is_final=True}
|
||||||
|
|
@ -113,17 +121,18 @@ Client → gRPC AgentRequest{query, session_id}
|
||||||
### 3.2 `AskAgentStream` (token streaming)
|
### 3.2 `AskAgentStream` (token streaming)
|
||||||
|
|
||||||
```
|
```
|
||||||
Client → gRPC AgentRequest{query, session_id}
|
Client → gRPC AgentRequest{query, session_id, editor_content*, selected_text*, extra_context*, user_info*}
|
||||||
│
|
│
|
||||||
|
├─ Decode Base64 fields
|
||||||
├─ Load history from session_store[session_id]
|
├─ Load history from session_store[session_id]
|
||||||
├─ Build initial_state
|
├─ Build initial_state
|
||||||
│
|
│
|
||||||
├─ prepare_graph.invoke(initial_state) ← Phase 1: no LLM generation
|
├─ prepare_graph.invoke(initial_state) ← Phase 1: no LLM generation
|
||||||
│ ├─ classify
|
│ ├─ classify → query_type + use_editor_context
|
||||||
│ ├─ reformulate
|
│ ├─ reformulate
|
||||||
│ └─ retrieve (or skip_retrieve if CONVERSATIONAL)
|
│ └─ retrieve (or skip_retrieve if CONVERSATIONAL)
|
||||||
│
|
│
|
||||||
├─ build_final_messages(prepared_state) ← Reconstruct prompt list
|
├─ build_final_messages(prepared_state) ← Reconstruct prompt with editor context if flagged
|
||||||
│
|
│
|
||||||
└─ for chunk in llm.stream(final_messages):
|
└─ for chunk in llm.stream(final_messages):
|
||||||
└─ yield AgentResponse{text=token, is_final=False}
|
└─ yield AgentResponse{text=token, is_final=False}
|
||||||
|
|
@ -132,7 +141,20 @@ Client → gRPC AgentRequest{query, session_id}
|
||||||
└─ yield AgentResponse{text="", is_final=True}
|
└─ yield AgentResponse{text="", is_final=True}
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3.3 `EvaluateRAG`
|
### 3.3 HTTP Proxy → gRPC
|
||||||
|
|
||||||
|
```
|
||||||
|
Client → POST /v1/chat/completions {messages, stream, session_id, user}
|
||||||
|
│
|
||||||
|
├─ Extract query from last user message in messages[]
|
||||||
|
├─ Read session_id from dedicated field (NOT from user)
|
||||||
|
├─ Parse user field as JSON → {editor_content, selected_text, extra_context, user_info}
|
||||||
|
│
|
||||||
|
├─ stream=false → _invoke_blocking() → AskAgent gRPC call
|
||||||
|
└─ stream=true → _iter_stream() → AskAgentStream gRPC call → SSE token stream
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.4 `EvaluateRAG`
|
||||||
|
|
||||||
```
|
```
|
||||||
Client → gRPC EvalRequest{category?, limit?, index?}
|
Client → gRPC EvalRequest{category?, limit?, index?}
|
||||||
|
|
@ -144,9 +166,8 @@ Client → gRPC EvalRequest{category?, limit?, index?}
|
||||||
│ ├─ retrieve_context (hybrid BM25+kNN, same as production)
|
│ ├─ retrieve_context (hybrid BM25+kNN, same as production)
|
||||||
│ └─ generate_answer (Ollama LLM + GENERATE_PROMPT)
|
│ └─ generate_answer (Ollama LLM + GENERATE_PROMPT)
|
||||||
├─ Build RAGAS Dataset
|
├─ Build RAGAS Dataset
|
||||||
├─ Run RAGAS metrics with Claude as judge:
|
├─ Run RAGAS metrics with Claude as judge
|
||||||
│ faithfulness / answer_relevancy / context_recall / context_precision
|
└─ Compute global_score + verdict
|
||||||
└─ Compute global_score + verdict (EXCELLENT / ACCEPTABLE / INSUFFICIENT)
|
|
||||||
│
|
│
|
||||||
└─ return EvalResponse{scores, global_score, verdict, details[]}
|
└─ return EvalResponse{scores, global_score, verdict, details[]}
|
||||||
```
|
```
|
||||||
|
|
@ -155,11 +176,28 @@ Client → gRPC EvalRequest{category?, limit?, index?}
|
||||||
|
|
||||||
## 4. LangGraph Workflow
|
## 4. LangGraph Workflow
|
||||||
|
|
||||||
### 4.1 Full Graph (`build_graph`)
|
### 4.1 Agent State
|
||||||
|
|
||||||
|
```python
|
||||||
|
class AgentState(TypedDict):
|
||||||
|
messages: Annotated[list, add_messages] # conversation history
|
||||||
|
session_id: str
|
||||||
|
query_type: str # RETRIEVAL | CODE_GENERATION | CONVERSATIONAL
|
||||||
|
reformulated_query: str
|
||||||
|
context: str # formatted RAG context string
|
||||||
|
editor_content: str # decoded from Base64
|
||||||
|
selected_text: str # decoded from Base64
|
||||||
|
extra_context: str # decoded from Base64
|
||||||
|
user_info: str # JSON string: {"dev_id", "project_id", "org_id"}
|
||||||
|
use_editor_context: bool # set by classifier — True only if query explicitly refers to editor
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.2 Full Graph (`build_graph`)
|
||||||
|
|
||||||
```
|
```
|
||||||
┌─────────────┐
|
┌─────────────┐
|
||||||
│ classify │
|
│ classify │ ← sees: query + history + selected_text (if present)
|
||||||
|
│ │ outputs: query_type + use_editor_context
|
||||||
└──────┬──────┘
|
└──────┬──────┘
|
||||||
│
|
│
|
||||||
┌────────────────┼──────────────────┐
|
┌────────────────┼──────────────────┐
|
||||||
|
|
@ -170,8 +208,12 @@ Client → gRPC EvalRequest{category?, limit?, index?}
|
||||||
▼ ▼
|
▼ ▼
|
||||||
┌──────────────┐ ┌────────────────────────┐
|
┌──────────────┐ ┌────────────────────────┐
|
||||||
│ reformulate │ │ respond_conversational │
|
│ reformulate │ │ respond_conversational │
|
||||||
└──────┬───────┘ └───────────┬────────────┘
|
│ │ └───────────┬────────────┘
|
||||||
▼ │
|
│ if use_editor│ │
|
||||||
|
│ anchor query │ │
|
||||||
|
│ to selected │ │
|
||||||
|
└──────┬───────┘ │
|
||||||
|
▼ │
|
||||||
┌──────────────┐ │
|
┌──────────────┐ │
|
||||||
│ retrieve │ │
|
│ retrieve │ │
|
||||||
└──────┬───────┘ │
|
└──────┬───────┘ │
|
||||||
|
|
@ -180,24 +222,54 @@ Client → gRPC EvalRequest{category?, limit?, index?}
|
||||||
▼ ▼ │
|
▼ ▼ │
|
||||||
┌──────────┐ ┌───────────────┐ │
|
┌──────────┐ ┌───────────────┐ │
|
||||||
│ generate │ │ generate_code │ │
|
│ generate │ │ generate_code │ │
|
||||||
└────┬─────┘ └───────┬───────┘ │
|
│ │ │ │ │
|
||||||
|
│ injects │ │ injects editor│ │
|
||||||
|
│ editor │ │ context only │ │
|
||||||
|
│ context │ │ if flag=True │ │
|
||||||
|
│ if flag │ └───────┬───────┘ │
|
||||||
|
└────┬─────┘ │ │
|
||||||
│ │ │
|
│ │ │
|
||||||
└────────────────────┴────────────────┘
|
└────────────────────┴────────────────┘
|
||||||
│
|
│
|
||||||
END
|
END
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4.2 Prepare Graph (`build_prepare_graph`)
|
### 4.3 Prepare Graph (`build_prepare_graph`)
|
||||||
|
|
||||||
Identical routing for classify, but generation nodes are replaced by `END`. The `CONVERSATIONAL` branch uses `skip_retrieve` (returns empty context without querying Elasticsearch).
|
Identical routing for classify, but generation nodes are replaced by `END`. The `CONVERSATIONAL` branch uses `skip_retrieve` (returns empty context). The `use_editor_context` flag is set here and carried forward into `build_final_messages`.
|
||||||
|
|
||||||
### 4.3 Query Type Routing
|
### 4.4 Classifier — Two-Token Output
|
||||||
|
|
||||||
| `query_type` | Triggers retrieve? | Generation prompt |
|
The classifier outputs exactly two tokens separated by a space:
|
||||||
|---|---|---|
|
|
||||||
| `RETRIEVAL` | Yes | `GENERATE_PROMPT` (explanation-focused) |
|
```
|
||||||
| `CODE_GENERATION` | Yes | `CODE_GENERATION_PROMPT` (code-focused, returns AVAP blocks) |
|
<query_type> <editor_signal>
|
||||||
| `CONVERSATIONAL` | No | `CONVERSATIONAL_PROMPT` (reformulation of prior answer) |
|
|
||||||
|
Examples:
|
||||||
|
RETRIEVAL NO_EDITOR
|
||||||
|
CODE_GENERATION EDITOR
|
||||||
|
CONVERSATIONAL NO_EDITOR
|
||||||
|
```
|
||||||
|
|
||||||
|
`EDITOR` is set only when the user message explicitly refers to editor code using expressions like "this code", "este codigo", "fix this", "que hace esto", "explain this", etc. General AVAP questions, code generation requests, and conversational follow-ups always return `NO_EDITOR`.
|
||||||
|
|
||||||
|
### 4.5 Query Type Routing
|
||||||
|
|
||||||
|
| `query_type` | Triggers retrieve? | Generation prompt | Editor context injected? |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `RETRIEVAL` | Yes | `GENERATE_PROMPT` | Only if `use_editor_context=True` |
|
||||||
|
| `CODE_GENERATION` | Yes | `CODE_GENERATION_PROMPT` | Only if `use_editor_context=True` |
|
||||||
|
| `CONVERSATIONAL` | No | `CONVERSATIONAL_PROMPT` | Never |
|
||||||
|
|
||||||
|
### 4.6 Reformulator — Mode-Aware & Language-Preserving
|
||||||
|
|
||||||
|
The reformulator receives `[MODE: <query_type>]` prepended to the query:
|
||||||
|
|
||||||
|
- **MODE RETRIEVAL:** Compresses the query into compact keywords. Does NOT expand with AVAP commands. Preserves original language — Spanish queries stay in Spanish, English queries stay in English.
|
||||||
|
- **MODE CODE_GENERATION:** Applies the AVAP command expansion mapping (registerEndpoint, addParam, ormAccessSelect, etc.).
|
||||||
|
- **MODE CONVERSATIONAL:** Returns the query as-is.
|
||||||
|
|
||||||
|
Language preservation is critical for BM25 retrieval — the AVAP LRM is written in Spanish, so a Spanish query must reach the retriever in Spanish for lexical matching to work correctly.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -206,12 +278,14 @@ Identical routing for classify, but generation nodes are replaced by `END`. The
|
||||||
The retrieval system (`hybrid_search_native`) fuses BM25 lexical search and kNN dense vector search using **Reciprocal Rank Fusion (RRF)**.
|
The retrieval system (`hybrid_search_native`) fuses BM25 lexical search and kNN dense vector search using **Reciprocal Rank Fusion (RRF)**.
|
||||||
|
|
||||||
```
|
```
|
||||||
User query
|
User query (reformulated, language-preserved)
|
||||||
│
|
│
|
||||||
├─ embeddings.embed_query(query) → query_vector [768-dim]
|
├─ embeddings.embed_query(query) → query_vector [1024-dim]
|
||||||
│
|
│
|
||||||
├─ ES multi_match (BM25) on fields [content^2, text^2]
|
├─ ES bool query:
|
||||||
│ └─ top-k BM25 hits
|
│ ├─ must: multi_match (BM25) on [content^2, text^2]
|
||||||
|
│ └─ should: boost spec/narrative doc_types (2.0x / 1.5x)
|
||||||
|
│ └─ top-k BM25 hits
|
||||||
│
|
│
|
||||||
└─ ES knn on field [embedding], num_candidates = k×5
|
└─ ES knn on field [embedding], num_candidates = k×5
|
||||||
└─ top-k kNN hits
|
└─ top-k kNN hits
|
||||||
|
|
@ -221,7 +295,9 @@ User query
|
||||||
└─ Top-8 documents → format_context() → context string
|
└─ Top-8 documents → format_context() → context string
|
||||||
```
|
```
|
||||||
|
|
||||||
**RRF constant:** `60` (standard value; prevents high-rank documents from dominating while still rewarding consensus between both retrieval modes).
|
**RRF constant:** `60` (standard value).
|
||||||
|
|
||||||
|
**doc_type boost:** `spec` and `narrative` chunks receive a score boost in the BM25 query to prioritize definitional and explanatory content over raw code examples when the query is about meaning or documentation.
|
||||||
|
|
||||||
**Chunk metadata** attached to each retrieved document:
|
**Chunk metadata** attached to each retrieved document:
|
||||||
|
|
||||||
|
|
@ -229,15 +305,79 @@ User query
|
||||||
|---|---|
|
|---|---|
|
||||||
| `chunk_id` | Unique identifier within the index |
|
| `chunk_id` | Unique identifier within the index |
|
||||||
| `source_file` | Origin document filename |
|
| `source_file` | Origin document filename |
|
||||||
| `doc_type` | `prose`, `code`, `code_example`, `bnf` |
|
| `doc_type` | `spec`, `code`, `code_example`, `bnf` |
|
||||||
| `block_type` | AVAP block type: `function`, `if`, `startLoop`, `try` |
|
| `block_type` | AVAP block type: `narrative`, `function`, `if`, `startLoop`, `try`, etc. |
|
||||||
| `section` | Document section/chapter heading |
|
| `section` | Document section/chapter heading |
|
||||||
|
|
||||||
Documents of type `code`, `code_example`, `bnf`, or block type `function / if / startLoop / try` are tagged as `[AVAP CODE]` in the formatted context, signaling the LLM to treat them as executable syntax rather than prose.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 6. Streaming Architecture (AskAgentStream)
|
## 6. Editor Context Pipeline
|
||||||
|
|
||||||
|
The editor context pipeline (PRD-0002) allows the VS Code extension to send the user's active editor state alongside every query. The engine uses this context only when the user explicitly refers to their code.
|
||||||
|
|
||||||
|
### Transport
|
||||||
|
|
||||||
|
Editor context travels differently depending on the client protocol:
|
||||||
|
|
||||||
|
**Via gRPC directly (`AgentRequest` fields 3–6):**
|
||||||
|
- `editor_content` (field 3) — Base64-encoded full file content
|
||||||
|
- `selected_text` (field 4) — Base64-encoded selected text
|
||||||
|
- `extra_context` (field 5) — Base64-encoded free-form context
|
||||||
|
- `user_info` (field 6) — JSON string `{"dev_id":…,"project_id":…,"org_id":…}`
|
||||||
|
|
||||||
|
**Via HTTP proxy (OpenAI `/v1/chat/completions`):**
|
||||||
|
- Transported in the standard `user` field as a JSON string
|
||||||
|
- Same four keys, same encodings
|
||||||
|
- The proxy parses, extracts, and forwards to gRPC
|
||||||
|
|
||||||
|
### Pipeline
|
||||||
|
|
||||||
|
```
|
||||||
|
AgentRequest arrives
|
||||||
|
│
|
||||||
|
├─ server.py: Base64 decode editor_content, selected_text, extra_context
|
||||||
|
├─ user_info passed as-is (JSON string)
|
||||||
|
│
|
||||||
|
└─ initial_state populated with all four fields
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
classify node:
|
||||||
|
├─ If selected_text present → injected into classify prompt as <editor_selection>
|
||||||
|
├─ LLM outputs: RETRIEVAL EDITOR or RETRIEVAL NO_EDITOR (etc.)
|
||||||
|
└─ use_editor_context = True if second token == EDITOR
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
reformulate node:
|
||||||
|
├─ If use_editor_context=True AND selected_text present:
|
||||||
|
│ anchor = selected_text + "\n\nUser question: " + query
|
||||||
|
│ → LLM reformulates using selected code as primary signal
|
||||||
|
└─ Else: reformulate query as normal
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
retrieve node: (unchanged — uses reformulated_query)
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
generate / generate_code node:
|
||||||
|
├─ If use_editor_context=True:
|
||||||
|
│ prompt = <selected_code> + <editor_file> + <extra_context> + RAG_prompt
|
||||||
|
│ Priority: selected_text > editor_content > RAG context > extra_context
|
||||||
|
└─ Else: standard RAG prompt — no editor content injected
|
||||||
|
```
|
||||||
|
|
||||||
|
### Intent detection examples
|
||||||
|
|
||||||
|
| User message | `use_editor_context` | Reason |
|
||||||
|
|---|---|---|
|
||||||
|
| "Que significa AVAP?" | `False` | General definition question |
|
||||||
|
| "dame un API de hello world" | `False` | Code generation, no editor reference |
|
||||||
|
| "que hace este codigo?" | `True` | Explicit reference to "this code" |
|
||||||
|
| "fix this" | `True` | Explicit reference to current selection |
|
||||||
|
| "como mejoro esto?" | `True` | Explicit reference to current context |
|
||||||
|
| "how does addVar work?" | `False` | Documentation question, no editor reference |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Streaming Architecture (AskAgentStream)
|
||||||
|
|
||||||
The two-phase streaming design is critical to understand:
|
The two-phase streaming design is critical to understand:
|
||||||
|
|
||||||
|
|
@ -245,20 +385,16 @@ The two-phase streaming design is critical to understand:
|
||||||
LangGraph's `stream()` method yields full state snapshots per node, not individual tokens. To achieve true per-token streaming to the gRPC client, the generation step is deliberately extracted from the graph and called directly via `llm.stream()`.
|
LangGraph's `stream()` method yields full state snapshots per node, not individual tokens. To achieve true per-token streaming to the gRPC client, the generation step is deliberately extracted from the graph and called directly via `llm.stream()`.
|
||||||
|
|
||||||
**Phase 1 — Deterministic preparation (graph-managed):**
|
**Phase 1 — Deterministic preparation (graph-managed):**
|
||||||
- Classification, query reformulation, and retrieval run through `prepare_graph.invoke()`.
|
Classification, query reformulation, and retrieval run through `prepare_graph.invoke()`. This phase runs synchronously and produces the complete context before any token is emitted to the client. Editor context classification also happens here — `use_editor_context` is set in the prepared state.
|
||||||
- This phase runs synchronously and produces the complete context before any token is emitted to the client.
|
|
||||||
|
|
||||||
**Phase 2 — Token streaming (manual):**
|
**Phase 2 — Token streaming (manual):**
|
||||||
- `build_final_messages()` reconstructs the exact prompt that `generate` / `generate_code` / `respond_conversational` would have used.
|
`build_final_messages()` reconstructs the exact prompt, injecting editor context if `use_editor_context` is `True`. `llm.stream(final_messages)` yields one `AIMessageChunk` per token from Ollama. Each token is immediately forwarded as `AgentResponse{text=token, is_final=False}`.
|
||||||
- `llm.stream(final_messages)` yields one `AIMessageChunk` per token from Ollama.
|
|
||||||
- Each token is immediately forwarded to the gRPC client as `AgentResponse{text=token, is_final=False}`.
|
|
||||||
- After the stream ends, the full assembled text is persisted to `session_store`.
|
|
||||||
|
|
||||||
**Backpressure:** gRPC streaming is flow-controlled by the client. If the client stops reading, the Ollama token stream will block at the `yield` point. No explicit buffer overflow protection is implemented (acceptable for the current single-client dev mode).
|
**Backpressure:** gRPC streaming is flow-controlled by the client. If the client stops reading, the Ollama token stream will block at the `yield` point.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 7. Evaluation Pipeline (EvaluateRAG)
|
## 8. Evaluation Pipeline (EvaluateRAG)
|
||||||
|
|
||||||
The evaluation suite implements an **offline RAG evaluation** pattern using RAGAS metrics.
|
The evaluation suite implements an **offline RAG evaluation** pattern using RAGAS metrics.
|
||||||
|
|
||||||
|
|
@ -288,7 +424,7 @@ verdict:
|
||||||
|
|
||||||
### Golden dataset
|
### Golden dataset
|
||||||
|
|
||||||
Located at `Docker/src/golden_dataset.json`. Each entry follows this schema:
|
Located at `Docker/src/golden_dataset.json`. Each entry:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
|
|
@ -299,9 +435,11 @@ Located at `Docker/src/golden_dataset.json`. Each entry follows this schema:
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
> **Note:** The golden dataset does not include editor-context queries. EvaluateRAG measures the RAG pipeline in isolation. A separate editor-context golden dataset is planned as future work once the VS Code extension is validated.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 8. Data Ingestion Pipeline
|
## 9. Data Ingestion Pipeline
|
||||||
|
|
||||||
Documents flow into the Elasticsearch index through two paths:
|
Documents flow into the Elasticsearch index through two paths:
|
||||||
|
|
||||||
|
|
@ -317,31 +455,29 @@ scripts/pipelines/flows/elasticsearch_ingestion.py
|
||||||
│
|
│
|
||||||
├─ Load markdown files
|
├─ Load markdown files
|
||||||
├─ Chunk using scripts/pipelines/tasks/chunk.py
|
├─ Chunk using scripts/pipelines/tasks/chunk.py
|
||||||
│ (semantic chunking via Chonkie library)
|
|
||||||
├─ Generate embeddings via scripts/pipelines/tasks/embeddings.py
|
├─ Generate embeddings via scripts/pipelines/tasks/embeddings.py
|
||||||
│ (Ollama or HuggingFace embedding model)
|
|
||||||
└─ Bulk index into Elasticsearch
|
└─ Bulk index into Elasticsearch
|
||||||
index: avap-docs-* (configurable via ELASTICSEARCH_INDEX)
|
|
||||||
mapping: {content, embedding, source_file, doc_type, section, ...}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Path B — Synthetic AVAP code samples
|
### Path B — AVAP native code chunker
|
||||||
|
|
||||||
```
|
```
|
||||||
docs/samples/*.avap
|
docs/samples/*.avap
|
||||||
│
|
│
|
||||||
▼
|
▼
|
||||||
scripts/pipelines/flows/generate_mbap.py
|
scripts/pipelines/ingestion/avap_chunker.py
|
||||||
|
│ (grammar: scripts/pipelines/ingestion/avap_config.json v2.0)
|
||||||
│
|
│
|
||||||
├─ Read AVAP LRM (docs/LRM/avap.md)
|
├─ Lexer strips comments and string contents
|
||||||
├─ Call Claude API to generate MBPP-style problems
|
├─ Block detection (function, if, startLoop, try)
|
||||||
└─ Output synthetic_datasets/mbpp_avap.json
|
├─ Statement classification (30 types + catch-all)
|
||||||
(used for fine-tuning and few-shot examples)
|
├─ Semantic tag assignment (18 boolean tags)
|
||||||
|
└─ Output: JSONL chunks → avap_ingestor.py → Elasticsearch
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 9. Infrastructure Layout
|
## 10. Infrastructure Layout
|
||||||
|
|
||||||
### Devaron Cluster (Vultr Kubernetes)
|
### Devaron Cluster (Vultr Kubernetes)
|
||||||
|
|
||||||
|
|
@ -352,22 +488,6 @@ scripts/pipelines/flows/generate_mbap.py
|
||||||
| Observability DB | `brunix-postgres` | `5432` | PostgreSQL for Langfuse |
|
| Observability DB | `brunix-postgres` | `5432` | PostgreSQL for Langfuse |
|
||||||
| Langfuse UI | — | `80` | `http://45.77.119.180` |
|
| Langfuse UI | — | `80` | `http://45.77.119.180` |
|
||||||
|
|
||||||
### Kubernetes tunnel commands
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Terminal 1 — LLM
|
|
||||||
kubectl port-forward --address 0.0.0.0 svc/ollama-light-service 11434:11434 \
|
|
||||||
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
|
|
||||||
|
|
||||||
# Terminal 2 — Elasticsearch
|
|
||||||
kubectl port-forward --address 0.0.0.0 svc/brunix-vector-db 9200:9200 \
|
|
||||||
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
|
|
||||||
|
|
||||||
# Terminal 3 — PostgreSQL (Langfuse)
|
|
||||||
kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 \
|
|
||||||
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
### Port map summary
|
### Port map summary
|
||||||
|
|
||||||
| Port | Protocol | Service | Scope |
|
| Port | Protocol | Service | Scope |
|
||||||
|
|
@ -381,7 +501,7 @@ kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 \
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 10. Session State & Conversation Memory
|
## 11. Session State & Conversation Memory
|
||||||
|
|
||||||
Conversation history is managed via an in-process dictionary:
|
Conversation history is managed via an in-process dictionary:
|
||||||
|
|
||||||
|
|
@ -395,69 +515,63 @@ session_store: dict[str, list] = defaultdict(list)
|
||||||
- **In-memory only.** History is lost on container restart.
|
- **In-memory only.** History is lost on container restart.
|
||||||
- **No TTL or eviction.** Sessions grow unbounded for the lifetime of the process.
|
- **No TTL or eviction.** Sessions grow unbounded for the lifetime of the process.
|
||||||
- **Thread safety:** Python's GIL provides basic safety for the `ThreadPoolExecutor(max_workers=10)` gRPC server, but concurrent writes to the same `session_id` from two simultaneous requests are not explicitly protected.
|
- **Thread safety:** Python's GIL provides basic safety for the `ThreadPoolExecutor(max_workers=10)` gRPC server, but concurrent writes to the same `session_id` from two simultaneous requests are not explicitly protected.
|
||||||
- **History window:** `format_history_for_classify()` uses only the last 6 messages for query classification to keep the classify prompt short and deterministic.
|
- **History window:** `format_history_for_classify()` uses only the last 6 messages for query classification.
|
||||||
|
|
||||||
> **Future work:** Replace `session_store` with a Redis-backed persistent store to survive restarts and support horizontal scaling.
|
> **Future work:** Replace `session_store` with a Redis-backed persistent store to survive restarts and support horizontal scaling.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 11. Observability Stack
|
## 12. Observability Stack
|
||||||
|
|
||||||
### Langfuse tracing
|
### Langfuse tracing
|
||||||
|
|
||||||
The server integrates Langfuse for end-to-end LLM tracing. Every `AskAgent` / `AskAgentStream` request creates a trace that captures:
|
Every `AskAgent` / `AskAgentStream` request creates a trace capturing input query, session ID, each LangGraph node execution, LLM token counts, latency, and final response.
|
||||||
- Input query and session ID
|
|
||||||
- Each LangGraph node execution (classify, reformulate, retrieve, generate)
|
|
||||||
- LLM token counts, latency, and cost
|
|
||||||
- Final response
|
|
||||||
|
|
||||||
**Access:** `http://45.77.119.180` — requires a project API key configured via `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY`.
|
**Access:** `http://45.77.119.180`
|
||||||
|
|
||||||
### Logging
|
### Logging
|
||||||
|
|
||||||
Structured logging via Python's `logging` module, configured at `INFO` level. Log format:
|
|
||||||
|
|
||||||
```
|
|
||||||
[MODULE] context_info — key=value key=value
|
|
||||||
```
|
|
||||||
|
|
||||||
Key log markers:
|
Key log markers:
|
||||||
|
|
||||||
| Marker | Module | Meaning |
|
| Marker | Module | Meaning |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `[ESEARCH]` | `server.py` | Elasticsearch connection status |
|
| `[ESEARCH]` | `server.py` | Elasticsearch connection status |
|
||||||
| `[classify]` | `graph.py` | Query type decision + raw LLM output |
|
| `[classify]` | `graph.py` | Query type + `use_editor_context` flag + raw LLM output |
|
||||||
| `[reformulate]` | `graph.py` | Reformulated query string |
|
| `[reformulate]` | `graph.py` | Reformulated query string + whether selected_text was used as anchor |
|
||||||
| `[hybrid]` | `graph.py` | BM25 / kNN hit counts and RRF result count |
|
| `[hybrid]` | `graph.py` | BM25 / kNN hit counts and RRF result count |
|
||||||
| `[retrieve]` | `graph.py` | Number of docs retrieved and context length |
|
| `[retrieve]` | `graph.py` | Number of docs retrieved and context length |
|
||||||
| `[generate]` | `graph.py` | Response character count |
|
| `[generate]` | `graph.py` | Response character count |
|
||||||
|
| `[AskAgent]` | `server.py` | editor and selected flags, query preview |
|
||||||
| `[AskAgentStream]` | `server.py` | Token count and total chars per stream |
|
| `[AskAgentStream]` | `server.py` | Token count and total chars per stream |
|
||||||
| `[eval]` | `evaluate.py` | Per-question retrieval and generation status |
|
| `[base64]` | `server.py` | Warning when a Base64 field fails to decode |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 12. Security Boundaries
|
## 13. Security Boundaries
|
||||||
|
|
||||||
| Boundary | Current state | Risk |
|
| Boundary | Current state | Risk |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| gRPC transport | **Insecure** (`add_insecure_port`) | Network interception possible. Acceptable in dev/tunnel setup; requires mTLS for production. |
|
| gRPC transport | **Insecure** (`add_insecure_port`) | Network interception possible. Acceptable in dev/tunnel setup; requires mTLS for production. |
|
||||||
| Elasticsearch auth | Optional (user/pass or API key via env vars) | Index is accessible without auth if `ELASTICSEARCH_USER` and `ELASTICSEARCH_API_KEY` are unset. |
|
| Elasticsearch auth | Optional (user/pass or API key via env vars) | Index is accessible without auth if vars are unset. |
|
||||||
|
| Editor context | Transmitted in plaintext (Base64 is encoding, not encryption) | File contents visible to anyone intercepting gRPC traffic. Requires TLS for production. |
|
||||||
| Container user | Non-root (`python:3.11-slim` default) | Low risk. Do not override with `root`. |
|
| Container user | Non-root (`python:3.11-slim` default) | Low risk. Do not override with `root`. |
|
||||||
| Secrets in env | Via `.env` / `docker-compose` env injection | Never commit real values. See [CONTRIBUTING.md](../CONTRIBUTING.md#6-environment-variables-policy). |
|
| Secrets in env | Via `.env` / `docker-compose` env injection | Never commit real values. |
|
||||||
| Session store | In-memory, no auth | Any caller with access to the gRPC port can read/write any session by guessing its ID. |
|
| Session store | In-memory, no auth | Any caller with gRPC access can read/write any session by guessing its ID. |
|
||||||
| Kubeconfig | `./kubernetes/kubeconfig.yaml` (local only) | Grants cluster access. Never commit. Listed in `.gitignore`. |
|
| `user_info` | JSON string, no validation | `dev_id`, `project_id`, `org_id` are not authenticated — passed as metadata only. |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 13. Known Limitations & Future Work
|
## 14. Known Limitations & Future Work
|
||||||
|
|
||||||
| Area | Limitation | Proposed solution |
|
| Area | Limitation | Proposed solution |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| Session persistence | In-memory, lost on restart | Redis-backed `session_store` |
|
| Session persistence | In-memory, lost on restart | Redis-backed `session_store` |
|
||||||
| Horizontal scaling | `session_store` is per-process | Sticky sessions or external session store |
|
| Horizontal scaling | `session_store` is per-process | Sticky sessions or external session store |
|
||||||
| gRPC security | Insecure port | Add TLS + optional mTLS |
|
| gRPC security | Insecure port | Add TLS + optional mTLS |
|
||||||
|
| Editor context security | Base64 is not encryption | TLS required before sending real file contents |
|
||||||
|
| `user_info` auth | Not validated or authenticated | JWT or API key validation on `user_info` fields |
|
||||||
| Elasticsearch auth | Not enforced if vars unset | Make auth required; fail-fast on startup |
|
| Elasticsearch auth | Not enforced if vars unset | Make auth required; fail-fast on startup |
|
||||||
| Context window | Full history passed to generate; no truncation | Sliding window or summarization for long sessions |
|
| Context window | Full history passed to generate; no truncation | Sliding window or summarization for long sessions |
|
||||||
| Evaluation | Golden dataset must be manually maintained | Automated golden dataset refresh pipeline |
|
| Evaluation | Golden dataset has no editor-context queries | Build dedicated editor-context golden dataset after VS Code validation |
|
||||||
| Rate limiting | None on gRPC server | Add interceptor-based rate limiter |
|
| Rate limiting | None on gRPC server | Add interceptor-based rate limiter |
|
||||||
| Health check | No gRPC health protocol | Implement `grpc.health.v1` |
|
| Health check | No gRPC health protocol | Implement `grpc.health.v1` |
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
nivel = 5
|
||||||
|
es_admin = nivel >= 10
|
||||||
|
addResult(es_admin)
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
subtotal = 150.50
|
||||||
|
iva = subtotal * 0.21
|
||||||
|
total = subtotal + iva
|
||||||
|
addResult(total)
|
||||||
|
|
@ -0,0 +1,6 @@
|
||||||
|
startLoop(i,1,10)
|
||||||
|
item = "item_%s" % i
|
||||||
|
AddvariableToJSON(item,'valor_generado',mi_json)
|
||||||
|
endLoop()
|
||||||
|
addResult(mi_json)
|
||||||
|
|
||||||
|
|
@ -0,0 +1,7 @@
|
||||||
|
registros = ['1','2','3']
|
||||||
|
getListLen(registros, total)
|
||||||
|
contador = 0
|
||||||
|
startLoop(idx, 0, 2)
|
||||||
|
actual = registros[int(idx)]
|
||||||
|
endLoop()
|
||||||
|
addResult(actual)
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
getDateTime("", 86400, "UTC", expira)
|
||||||
|
addResult(expira)
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
addParam("client_id", id_interno)
|
||||||
|
addResult(id_interno)
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
addParam(emails,emails)
|
||||||
|
getQueryParamList(lista_correos)
|
||||||
|
addResult(lista_correos)
|
||||||
|
|
@ -0,0 +1,6 @@
|
||||||
|
addParam("lang", l)
|
||||||
|
addParam("lang2", l2)
|
||||||
|
if(l, "es", "=")
|
||||||
|
addVar(msg, "Hola")
|
||||||
|
end()
|
||||||
|
addResult(msg)
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
nombre = "Sistema"
|
||||||
|
log = "Evento registrado por: %s" % nombre
|
||||||
|
addResult(log)
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
datos_cliente = "datos"
|
||||||
|
addVar(clave, "cliente_vip")
|
||||||
|
AddvariableToJSON(clave, datos_cliente, mi_json_final)
|
||||||
|
addResult(mi_json_final)
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
addParam("data_list", mi_lista)
|
||||||
|
getListLen(mi_lista, cantidad)
|
||||||
|
addResult(cantidad)
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
stampToDatetime(1708726162, "%d/%m/%Y", 0, fecha_human)
|
||||||
|
addResult(fecha_human)
|
||||||
|
|
@ -0,0 +1,7 @@
|
||||||
|
addParam(sal_par,saldo)
|
||||||
|
if(saldo, 0, ">")
|
||||||
|
permitir = True
|
||||||
|
else()
|
||||||
|
permitir = False
|
||||||
|
end()
|
||||||
|
addResult(permitir)
|
||||||
|
|
@ -0,0 +1,6 @@
|
||||||
|
addParam(userrype, user_type)
|
||||||
|
addParam(sells, compras)
|
||||||
|
if(None, None, " user_type == 'VIP' or compras > 100")
|
||||||
|
addVar(descuento, 0.20)
|
||||||
|
end()
|
||||||
|
addResult(descuento)
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
getDateTime("%Y-%m-%d %H:%M:%S", 0, "Europe/Madrid", sql_date)
|
||||||
|
addResult(sql_date)
|
||||||
|
|
@ -0,0 +1,6 @@
|
||||||
|
function suma(a, b){
|
||||||
|
total = a + b
|
||||||
|
return(total)
|
||||||
|
}
|
||||||
|
resultado = suma(10, 20)
|
||||||
|
addResult(resultado)
|
||||||
|
|
@ -0,0 +1,9 @@
|
||||||
|
function es_valido(token){
|
||||||
|
response = False
|
||||||
|
if(token, "SECRET", "=")
|
||||||
|
response = True
|
||||||
|
end()
|
||||||
|
return(response)
|
||||||
|
}
|
||||||
|
autorizado = es_valido("SECRET")
|
||||||
|
addResult(autorizado)
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
randomString("[A-Z]\d", 32, token_seguridad)
|
||||||
|
addResult(token_seguridad)
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
encodeSHA256("payload_data", checksum)
|
||||||
|
addResult(checksum)
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
addVar(mensaje, "Hola mundo desde AVAP")
|
||||||
|
addResult(mensaje)
|
||||||
|
|
@ -0,0 +1,6 @@
|
||||||
|
addParam(password,pass_nueva)
|
||||||
|
pass_antigua = "password"
|
||||||
|
if(pass_nueva, pass_antigua, "!=")
|
||||||
|
addVar(cambio, "Contraseña actualizada")
|
||||||
|
end()
|
||||||
|
addResult(cambio)
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
replace("REF_1234_OLD","OLD", "NEW", ref_actualizada)
|
||||||
|
addResult(ref_actualizada)
|
||||||
|
|
@ -0,0 +1,5 @@
|
||||||
|
try()
|
||||||
|
ormDirect("UPDATE table_inexistente SET a=1", res)
|
||||||
|
exception(e)
|
||||||
|
addVar(_status,500)
|
||||||
|
addResult("Error de base de datos")
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
getDateTime("", 0, "UTC", ahora)
|
||||||
|
addResult(ahora)
|
||||||
|
|
@ -0,0 +1,6 @@
|
||||||
|
ormCheckTable(tabla_pruebas,resultado_comprobacion)
|
||||||
|
if(resultado_comprobacion,False,'==')
|
||||||
|
ormCreateTable("username,age",'VARCHAR,INTEGER',tabla_pruebas,resultado_creacion)
|
||||||
|
end()
|
||||||
|
addResult(resultado_comprobacion)
|
||||||
|
addResult(resultado_creacion)
|
||||||
|
|
@ -0,0 +1,14 @@
|
||||||
|
addParam("page", p)
|
||||||
|
addParam("size", s)
|
||||||
|
registros = ["u1", "u2", "u3", "u4", "u5", "u6"]
|
||||||
|
offset = int(p) * int(s)
|
||||||
|
limite = offset + int(s)
|
||||||
|
contador = 0
|
||||||
|
addResult(offset)
|
||||||
|
addResult(limite)
|
||||||
|
startLoop(i, 2, limite)
|
||||||
|
actual = registros[int(i)]
|
||||||
|
titulo = "reg_%s" % i
|
||||||
|
AddvariableToJSON(titulo, actual, pagina_json)
|
||||||
|
endLoop()
|
||||||
|
addResult(pagina_json)
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
addVar(base, 1000)
|
||||||
|
addVar(copia, $base)
|
||||||
|
addResult(copia)
|
||||||
|
|
@ -0,0 +1,9 @@
|
||||||
|
addParam("password_base", password_base)
|
||||||
|
replace(password_base, "a", "@", temp1)
|
||||||
|
replace(temp1, "e", "3", temp2)
|
||||||
|
replace(temp2, "o", "0", temp3)
|
||||||
|
replace(temp3, "i", "!", modified_password)
|
||||||
|
randomString("[a-zA-Z0-9]", 4, suffix)
|
||||||
|
addVar(final_password, modified_password)
|
||||||
|
final_password = final_password + suffix
|
||||||
|
addResult(final_password)
|
||||||
|
|
@ -0,0 +1,4 @@
|
||||||
|
addVar(code, 200)
|
||||||
|
addVar(status, "Success")
|
||||||
|
addResult(code)
|
||||||
|
addResult(status)
|
||||||
|
|
@ -0,0 +1,8 @@
|
||||||
|
encontrado = False
|
||||||
|
startLoop(i, 1, 10)
|
||||||
|
if(i, 5, "==")
|
||||||
|
encontrado = True
|
||||||
|
i = 11
|
||||||
|
end()
|
||||||
|
endLoop()
|
||||||
|
addResult(encontrado)
|
||||||
|
|
@ -0,0 +1,5 @@
|
||||||
|
addParam("password", password)
|
||||||
|
encodeSHA256(password, hashed_password)
|
||||||
|
randomString("[a-zA-Z0-9]", 32, secure_token)
|
||||||
|
addResult(hashed_password)
|
||||||
|
addResult(secure_token)
|
||||||
|
|
@ -0,0 +1,5 @@
|
||||||
|
try()
|
||||||
|
RequestGet("https://api.test.com/data", 0, 0, respuesta)
|
||||||
|
exception(e)
|
||||||
|
addVar(error_trace, "Fallo de conexión: %s" % e)
|
||||||
|
addResult(error_trace)
|
||||||
|
|
@ -0,0 +1,6 @@
|
||||||
|
addParam("api_key", key)
|
||||||
|
if(key, None, "==")
|
||||||
|
addVar(_status, 403)
|
||||||
|
addVar(error, "Acceso denegado: falta API KEY")
|
||||||
|
addResult(error)
|
||||||
|
end()
|
||||||
|
|
@ -0,0 +1,2 @@
|
||||||
|
stub(addResult(error), 5) => {}
|
||||||
|
assert(addResult(error), 5): {}
|
||||||
|
|
@ -0,0 +1,5 @@
|
||||||
|
addParam("rol", r)
|
||||||
|
if(r, ["admin", "editor", "root"], "in")
|
||||||
|
acceso = True
|
||||||
|
end()
|
||||||
|
addResult(acceso)
|
||||||
|
|
@ -0,0 +1,89 @@
|
||||||
|
# PRD-0001: OpenAI-Compatible HTTP Proxy
|
||||||
|
|
||||||
|
**Date:** 2026-03-18
|
||||||
|
**Status:** Implemented
|
||||||
|
**Requested by:** Rafael Ruiz (CTO)
|
||||||
|
**Implemented in:** PR #58
|
||||||
|
**Related ADR:** ADR-0001 (gRPC as primary interface)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
The Brunix Assistance Engine exposes a gRPC interface as its primary API. gRPC is the right choice for performance and type safety in server-to-server communication, but it creates a significant adoption barrier for two categories of consumers:
|
||||||
|
|
||||||
|
**Existing OpenAI integrations.** Any tool or client already configured to call the OpenAI API — VS Code extensions using `continue.dev`, LiteLLM routers, Open WebUI instances, internal tooling at 101OBEX, Corp — requires code changes to switch to gRPC. The switching cost is non-trivial and creates friction that slows adoption.
|
||||||
|
|
||||||
|
**Model replacement use case.** The core strategic value of the Brunix RAG is that it can replace direct OpenAI API consumption with a locally-hosted, domain-specific assistant that has no per-token cost and no data privacy concerns. This value proposition is only actionable if the replacement is transparent — i.e., the client does not need to change to consume the Brunix RAG instead of OpenAI.
|
||||||
|
|
||||||
|
Without a compatibility layer, the Brunix engine cannot serve as a drop-in replacement for OpenAI models. Every potential adopter faces an integration project instead of a configuration change.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
Implement an HTTP server running alongside the gRPC server that exposes:
|
||||||
|
|
||||||
|
- The OpenAI Chat Completions API (`/v1/chat/completions`) — both streaming and non-streaming
|
||||||
|
- The OpenAI Completions API (`/v1/completions`) — legacy support
|
||||||
|
- The OpenAI Models API (`/v1/models`) — for compatibility with clients that enumerate available models
|
||||||
|
- The Ollama Chat API (`/api/chat`) — NDJSON streaming format
|
||||||
|
- The Ollama Generate API (`/api/generate`) — for Ollama-native clients
|
||||||
|
- The Ollama Tags API (`/api/tags`) — for clients that list available models
|
||||||
|
- A health endpoint (`/health`)
|
||||||
|
|
||||||
|
The proxy bridges HTTP → gRPC internally: `stream: false` routes to `AskAgent`, `stream: true` routes to `AskAgentStream`. The gRPC interface remains the primary interface and is not modified.
|
||||||
|
|
||||||
|
Any client that currently points to `https://api.openai.com` can be reconfigured to point to `http://localhost:8000` (or the server's address) with `model: brunix` and will work without any other change.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
**In scope:**
|
||||||
|
- OpenAI-compatible endpoints as listed above
|
||||||
|
- Ollama-compatible endpoints as listed above
|
||||||
|
- Routing `stream: false` to `AskAgent` and `stream: true` to `AskAgentStream`
|
||||||
|
- Session ID propagation via the `session_id` extension field in `ChatCompletionRequest`
|
||||||
|
- Health endpoint
|
||||||
|
|
||||||
|
**Out of scope:**
|
||||||
|
- OpenAI function calling / tool use
|
||||||
|
- OpenAI embeddings API (`/v1/embeddings`)
|
||||||
|
- OpenAI fine-tuning or moderation APIs
|
||||||
|
- Authentication / API key validation (handled at infrastructure level)
|
||||||
|
- Multi-turn conversation reconstruction from the message array (the proxy extracts only the last user message as the query)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Technical implementation
|
||||||
|
|
||||||
|
**Stack:** FastAPI + uvicorn, running on port 8000 inside the same container as the gRPC server.
|
||||||
|
|
||||||
|
**Concurrency:** An asyncio event loop bridges FastAPI's async context with the synchronous gRPC calls via a dedicated `ThreadPoolExecutor` (configurable via `PROXY_THREAD_WORKERS`, default 20). This prevents gRPC blocking calls from stalling the async HTTP server.
|
||||||
|
|
||||||
|
**Streaming:** An `asyncio.Queue` connects the gRPC token stream (produced in a thread) with the FastAPI `StreamingResponse` (consumed in the async event loop). Tokens are forwarded as SSE events (OpenAI format) or NDJSON (Ollama format) as they arrive from `AskAgentStream`.
|
||||||
|
|
||||||
|
**Entry point:** `entrypoint.sh` starts both the gRPC server and the HTTP proxy as parallel processes. If either crashes, the other is terminated — the container fails cleanly rather than entering a partially active state.
|
||||||
|
|
||||||
|
**Environment variables:**
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `BRUNIX_GRPC_TARGET` | `localhost:50051` | gRPC server address |
|
||||||
|
| `PROXY_MODEL_ID` | `brunix` | Model name returned by `/v1/models` and `/api/tags` |
|
||||||
|
| `PROXY_THREAD_WORKERS` | `20` | ThreadPoolExecutor size for gRPC calls |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Validation
|
||||||
|
|
||||||
|
**Functional:** Any OpenAI-compatible client (continue.dev, LiteLLM, Open WebUI) can be pointed at `http://localhost:8000` with `model: brunix` and successfully send queries to the Brunix RAG without code changes.
|
||||||
|
|
||||||
|
**Strategic:** The VS Code extension and any 101OBEX, Corp internal tooling currently consuming OpenAI can switch to the Brunix RAG by changing one endpoint URL and one model name. No other changes required.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Impact on existing interfaces
|
||||||
|
|
||||||
|
The gRPC interface (`AskAgent`, `AskAgentStream`, `EvaluateRAG`) is unchanged. Existing gRPC clients are not affected. The proxy is additive — it does not replace the gRPC interface, it complements it.
|
||||||
|
|
@ -0,0 +1,199 @@
|
||||||
|
# PRD-0002: Editor Context Injection for VS Code Extension
|
||||||
|
|
||||||
|
**Date:** 2026-03-19
|
||||||
|
**Status:** Implemented
|
||||||
|
**Requested by:** Rafael Ruiz (CTO)
|
||||||
|
**Purpose:** Validate the VS Code extension with real users
|
||||||
|
**Related ADR:** ADR-0001 (gRPC interface), ADR-0002 (two-phase streaming)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
The Brunix Assistance Engine previously received only two inputs from the client: a `query` (the user's question) and a `session_id` (for conversation continuity). It had no awareness of what the user was looking at in their editor when they asked the question.
|
||||||
|
|
||||||
|
This created a fundamental limitation for a coding assistant: the user asking "how do I handle the error here?" or "what does this function return?" could not be answered correctly without knowing what "here" and "this function" referred to. The assistant was forced to treat every question as a general AVAP documentation query, even when the user's intent was clearly anchored to specific code in their editor.
|
||||||
|
|
||||||
|
For the VS Code extension validation, the CEO needed to demonstrate that the assistant behaves as a genuine coding assistant — one that understands the user's current context — not just a documentation search tool.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
The gRPC contract has been extended to allow the VS Code extension to send four optional context fields alongside every query. These fields are transported in the standard OpenAI `user` field as a JSON string when using the HTTP proxy, and as dedicated proto fields when calling gRPC directly.
|
||||||
|
|
||||||
|
**Transport format via HTTP proxy (`/v1/chat/completions`):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"model": "brunix",
|
||||||
|
"messages": [{"role": "user", "content": "que hace este código?"}],
|
||||||
|
"stream": true,
|
||||||
|
"session_id": "uuid",
|
||||||
|
"user": "{\"editor_content\":\"<base64>\",\"selected_text\":\"<base64>\",\"extra_context\":\"<base64>\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fields:**
|
||||||
|
- **`editor_content`** (base64) — full content of the active file open in the editor. Gives the assistant awareness of the complete code the user is working on.
|
||||||
|
- **`selected_text`** (base64) — text currently selected in the editor, if any. The most precise signal of user intent — if the user has selected a block of code before asking a question, that block is almost certainly what the question is about.
|
||||||
|
- **`extra_context`** (base64) — free-form additional context (e.g., file path, language identifier, cursor position, open diagnostic errors). Extensible without requiring proto changes.
|
||||||
|
- **`user_info`** (JSON object) — client identity metadata: `dev_id`, `project_id`, `org_id`. Not base64 — sent as a JSON object nested within the `user` JSON string.
|
||||||
|
|
||||||
|
All four fields are optional. If none are provided, the assistant behaves exactly as it does today — full backward compatibility.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## User experience
|
||||||
|
|
||||||
|
**Scenario 1 — Question about selected code:**
|
||||||
|
The user selects a `try() / exception() / end()` block in their editor and asks "why is this not catching my error?". The assistant detects via the classifier that the question refers explicitly to the selected code, injects `selected_text` into the generation prompt, and answers specifically about that block — not about error handling in general.
|
||||||
|
|
||||||
|
**Scenario 2 — Question about the open file:**
|
||||||
|
The user has a full AVAP function open and asks "what HTTP status codes can this return?". The classifier detects the question refers to editor content, injects `editor_content` into the generation prompt, and reasons about the `_status` assignments in the function.
|
||||||
|
|
||||||
|
**Scenario 3 — General question (unchanged behaviour):**
|
||||||
|
The user asks "how does addVar work?" without selecting anything or referring to the editor. The classifier sets `use_editor_context: False`. The assistant behaves exactly as before — retrieval-augmented response from the AVAP knowledge base, no editor content injected.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
**In scope:**
|
||||||
|
- Add `editor_content`, `selected_text`, `extra_context`, `user_info` fields to `AgentRequest` in `brunix.proto`
|
||||||
|
- Decode base64 fields (`editor_content`, `selected_text`, `extra_context`) in `server.py` before propagating to graph state
|
||||||
|
- Parse `user_info` as opaque JSON string — available in state for future use, not yet consumed by the graph
|
||||||
|
- Parse the `user` field in `openai_proxy.py` as a JSON object containing all four context fields
|
||||||
|
- Propagate all fields through the server into the graph state (`AgentState`)
|
||||||
|
- Extend the classifier (`CLASSIFY_PROMPT_TEMPLATE`) to output two tokens: query type and editor context signal (`EDITOR` / `NO_EDITOR`)
|
||||||
|
- Set `use_editor_context: bool` in `AgentState` based on classifier output
|
||||||
|
- Use `selected_text` as the primary anchor for query reformulation only when `use_editor_context` is `True`
|
||||||
|
- Inject `selected_text` and `editor_content` into the generation prompt only when `use_editor_context` is `True`
|
||||||
|
- Fix reformulator language — queries must be rewritten in the original language, never translated
|
||||||
|
|
||||||
|
**Out of scope:**
|
||||||
|
- Changes to `EvaluateRAG` — the golden dataset does not include editor-context queries; this feature does not affect embedding or retrieval evaluation
|
||||||
|
- Consuming `user_info` fields (`dev_id`, `project_id`, `org_id`) in the graph — available in state for future routing or personalisation
|
||||||
|
- Evaluation of the feature impact via EvaluateRAG — a dedicated golden dataset with editor-context queries is required for that measurement; it is future work
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Technical design
|
||||||
|
|
||||||
|
### Proto changes (`brunix.proto`)
|
||||||
|
|
||||||
|
```protobuf
|
||||||
|
message AgentRequest {
|
||||||
|
string query = 1; // unchanged
|
||||||
|
string session_id = 2; // unchanged
|
||||||
|
string editor_content = 3; // base64-encoded full editor file content
|
||||||
|
string selected_text = 4; // base64-encoded currently selected text
|
||||||
|
string extra_context = 5; // base64-encoded free-form additional context
|
||||||
|
string user_info = 6; // JSON string: {"dev_id":…,"project_id":…,"org_id":…}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Fields 1 and 2 are unchanged. Fields 3–6 are optional — absent fields default to empty string in proto3. All existing clients remain compatible without modification.
|
||||||
|
|
||||||
|
### AgentState changes (`state.py`)
|
||||||
|
|
||||||
|
```python
|
||||||
|
class AgentState(TypedDict):
|
||||||
|
# Core fields
|
||||||
|
messages: Annotated[list, add_messages]
|
||||||
|
session_id: str
|
||||||
|
query_type: str
|
||||||
|
reformulated_query: str
|
||||||
|
context: str
|
||||||
|
# Editor context fields (PRD-0002)
|
||||||
|
editor_content: str # decoded from base64
|
||||||
|
selected_text: str # decoded from base64
|
||||||
|
extra_context: str # decoded from base64
|
||||||
|
user_info: str # JSON string — {"dev_id":…,"project_id":…,"org_id":…}
|
||||||
|
# Set by classifier — True only when user explicitly refers to editor code
|
||||||
|
use_editor_context: bool
|
||||||
|
```
|
||||||
|
|
||||||
|
### Server changes (`server.py`)
|
||||||
|
|
||||||
|
Base64 decoding applied to `editor_content`, `selected_text` and `extra_context` before propagation. `user_info` passed as-is (plain JSON string). Helper function:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def _decode_b64(value: str) -> str:
|
||||||
|
try:
|
||||||
|
return base64.b64decode(value).decode("utf-8") if value else ""
|
||||||
|
except Exception:
|
||||||
|
logger.warning(f"[base64] decode failed")
|
||||||
|
return ""
|
||||||
|
```
|
||||||
|
|
||||||
|
### Proxy changes (`openai_proxy.py`)
|
||||||
|
|
||||||
|
The `user` field is parsed as a JSON object. `_parse_editor_context` extracts all four fields:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def _parse_editor_context(user: Optional[str]) -> tuple[str, str, str, str]:
|
||||||
|
if not user:
|
||||||
|
return "", "", "", ""
|
||||||
|
try:
|
||||||
|
ctx = json.loads(user)
|
||||||
|
if isinstance(ctx, dict):
|
||||||
|
return (
|
||||||
|
ctx.get("editor_content", "") or "",
|
||||||
|
ctx.get("selected_text", "") or "",
|
||||||
|
ctx.get("extra_context", "") or "",
|
||||||
|
json.dumps(ctx.get("user_info", {})),
|
||||||
|
)
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
pass
|
||||||
|
return "", "", "", ""
|
||||||
|
```
|
||||||
|
|
||||||
|
`session_id` is now read exclusively from the dedicated `session_id` field — no longer falls back to `user`.
|
||||||
|
|
||||||
|
### Classifier changes (`prompts.py` + `graph.py`)
|
||||||
|
|
||||||
|
`CLASSIFY_PROMPT_TEMPLATE` now outputs two tokens separated by a space:
|
||||||
|
- First token: `RETRIEVAL`, `CODE_GENERATION`, or `CONVERSATIONAL`
|
||||||
|
- Second token: `EDITOR` or `NO_EDITOR`
|
||||||
|
|
||||||
|
`EDITOR` is set only when the user message explicitly refers to the editor code or selected text using expressions like "this code", "este codigo", "fix this", "que hace esto", "explain this", etc.
|
||||||
|
|
||||||
|
`_parse_query_type` returns `tuple[str, bool]`. Both `classify` nodes (in `build_graph` and `build_prepare_graph`) set `use_editor_context` in the state.
|
||||||
|
|
||||||
|
### Reformulator changes (`prompts.py` + `graph.py`)
|
||||||
|
|
||||||
|
Two fixes applied:
|
||||||
|
|
||||||
|
**Mode-aware reformulation:** The reformulator receives `[MODE: X]` prepended to the query. In `RETRIEVAL` mode it compresses the query without expanding AVAP commands. In `CODE_GENERATION` mode it applies the command mapping. In `CONVERSATIONAL` mode it returns the query as-is.
|
||||||
|
|
||||||
|
**Language preservation:** The reformulator never translates. Queries in Spanish are rewritten in Spanish. Queries in English are rewritten in English. This fix was required because the BM25 retrieval is lexical — a Spanish chunk ("AVAP es un DSL...") cannot be found by an English query ("AVAP stand for").
|
||||||
|
|
||||||
|
### Generator changes (`graph.py`)
|
||||||
|
|
||||||
|
`_build_generation_prompt` injects `editor_content` and `selected_text` into the prompt only when `use_editor_context` is `True`. Priority hierarchy when injected:
|
||||||
|
1. `selected_text` — highest priority, most specific signal
|
||||||
|
2. `editor_content` — file-level context
|
||||||
|
3. RAG-retrieved chunks — knowledge base context
|
||||||
|
4. `extra_context` — free-form additional context
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Validation
|
||||||
|
|
||||||
|
**Acceptance criteria:**
|
||||||
|
- A query explicitly referring to selected code (`selected_text` non-empty, classifier returns `EDITOR`) produces a response grounded in that specific code.
|
||||||
|
- A general query (`use_editor_context: False`) produces a response identical in quality to the pre-PRD-0002 system — no editor content injected, no regression.
|
||||||
|
- A query in Spanish retrieves Spanish chunks correctly — the reformulator preserves the language.
|
||||||
|
- Existing gRPC clients that do not send the new fields work without modification.
|
||||||
|
- The `user` field in the HTTP proxy can be a plain string or absent — no error raised.
|
||||||
|
|
||||||
|
**Future measurement:**
|
||||||
|
Once the extension is validated and the embedding model is selected (ADR-0005), a dedicated golden dataset of editor-context queries should be built and added to `EvaluateRAG` to measure the quantitative impact of this feature.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Impact on parallel workstreams
|
||||||
|
|
||||||
|
**Embedding evaluation (ADR-0005 / MrHouston):** No impact. The BEIR benchmarks and EvaluateRAG runs for embedding model selection use the existing golden dataset, which contains no editor-context queries. The two workstreams are independent.
|
||||||
|
|
||||||
|
**RAG architecture evolution:** This feature is additive. It does not change the retrieval infrastructure, the Elasticsearch index, or the embedding pipeline. It extends the graph with additional input signals that improve response quality for editor-anchored queries.
|
||||||
|
|
@ -41,3 +41,5 @@ dev = [
|
||||||
"selenium>=4.41.0",
|
"selenium>=4.41.0",
|
||||||
"tree-sitter-language-pack>=0.13.0",
|
"tree-sitter-language-pack>=0.13.0",
|
||||||
]
|
]
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
testpaths = ["Docker/tests"]
|
||||||
|
|
@ -0,0 +1,99 @@
|
||||||
|
{
|
||||||
|
"language": "avap",
|
||||||
|
"version": "2.0",
|
||||||
|
"file_extensions": [".avap"],
|
||||||
|
|
||||||
|
"lexer": {
|
||||||
|
"string_delimiters": ["\"", "'"],
|
||||||
|
"escape_char": "\\",
|
||||||
|
"comment_line": ["///", "//"],
|
||||||
|
"comment_block": { "open": "/*", "close": "*/" },
|
||||||
|
"line_oriented": true
|
||||||
|
},
|
||||||
|
|
||||||
|
"blocks": [
|
||||||
|
{
|
||||||
|
"name": "function",
|
||||||
|
"doc_type": "code",
|
||||||
|
"opener_pattern": "^\\s*function\\s+(\\w+)\\s*\\(([^)]*)",
|
||||||
|
"closer_pattern": "^\\s*\\}\\s*$",
|
||||||
|
"extract_signature": true,
|
||||||
|
"signature_template": "function {group1}({group2})"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "if",
|
||||||
|
"doc_type": "code",
|
||||||
|
"opener_pattern": "^\\s*if\\s*\\(",
|
||||||
|
"closer_pattern": "^\\s*end\\s*\\(\\s*\\)",
|
||||||
|
"note": "Closer is end(). The else() marker is an inline separator within the if block, not a block opener."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "startLoop",
|
||||||
|
"doc_type": "code",
|
||||||
|
"opener_pattern": "^\\s*startLoop\\s*\\(",
|
||||||
|
"closer_pattern": "^\\s*endLoop\\s*\\(\\s*\\)"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "try",
|
||||||
|
"doc_type": "code",
|
||||||
|
"opener_pattern": "^\\s*try\\s*\\(\\s*\\)",
|
||||||
|
"closer_pattern": "^\\s*end\\s*\\(\\s*\\)",
|
||||||
|
"note": "try() closes with end(), same as if(). The exception() command is a statement within the try block."
|
||||||
|
}
|
||||||
|
],
|
||||||
|
|
||||||
|
"statements": [
|
||||||
|
{ "name": "registerEndpoint", "pattern": "^\\s*registerEndpoint\\s*\\(" },
|
||||||
|
{ "name": "addVar", "pattern": "^\\s*addVar\\s*\\(" },
|
||||||
|
{ "name": "addResult", "pattern": "^\\s*addResult\\s*\\(" },
|
||||||
|
{ "name": "addParam", "pattern": "^\\s*addParam\\s*\\(" },
|
||||||
|
{ "name": "getQueryParamList", "pattern": "^\\s*getQueryParamList\\s*\\(" },
|
||||||
|
{ "name": "getListLen", "pattern": "^\\s*getListLen\\s*\\(" },
|
||||||
|
{ "name": "itemFromList", "pattern": "^\\s*itemFromList\\s*\\(" },
|
||||||
|
{ "name": "variableToList", "pattern": "^\\s*variableToList\\s*\\(" },
|
||||||
|
{ "name": "variableFromJSON", "pattern": "^\\s*variableFromJSON\\s*\\(" },
|
||||||
|
{ "name": "addVariableToJSON", "pattern": "^\\s*AddvariableToJSON\\s*\\(|^\\s*AddVariableToJSON\\s*\\(", "note": "init.sql uses AddvariableToJSON (lowercase v). Both casings are accepted." },
|
||||||
|
{ "name": "RequestGet", "pattern": "^\\s*\\w+\\s*=\\s*RequestGet\\s*\\(|^\\s*RequestGet\\s*\\(" },
|
||||||
|
{ "name": "RequestPost", "pattern": "^\\s*\\w+\\s*=\\s*RequestPost\\s*\\(|^\\s*RequestPost\\s*\\(" },
|
||||||
|
{ "name": "ormDirect", "pattern": "^\\s*\\w+\\s*=\\s*ormDirect\\s*\\(|^\\s*ormDirect\\s*\\(" },
|
||||||
|
{ "name": "orm_command", "pattern": "^\\s*(ormCheckTable|ormCreateTable|ormAccessSelect|ormAccessInsert|ormAccessUpdate)\\s*\\(" },
|
||||||
|
{ "name": "exception", "pattern": "^\\s*exception\\s*\\(|^\\s*\\w+\\s*=\\s*exception\\s*\\(", "note": "exception() appears inside try blocks as an error capture statement. Can be used as: exception(var) or var = exception(...)" },
|
||||||
|
{ "name": "else", "pattern": "^\\s*else\\s*\\(\\s*\\)", "note": "else() is a flow separator marker inside if() blocks. Not a block opener — the parser handles branching at the if() level." },
|
||||||
|
{ "name": "end", "pattern": "^\\s*end\\s*\\(\\s*\\)", "note": "end() closes if() and try() blocks. Handled by the block closer_pattern of those blocks. Listed here as a fallback for standalone end() statements." },
|
||||||
|
{ "name": "endLoop", "pattern": "^\\s*endLoop\\s*\\(\\s*\\)", "note": "endLoop() closes startLoop() blocks. Listed here as a fallback for standalone endLoop() statements." },
|
||||||
|
{ "name": "encodeSHA256", "pattern": "^\\s*\\w+\\s*=\\s*encodeSHA256\\s*\\(|^\\s*encodeSHA256\\s*\\(" },
|
||||||
|
{ "name": "encodeMD5", "pattern": "^\\s*\\w+\\s*=\\s*encodeMD5\\s*\\(|^\\s*encodeMD5\\s*\\(" },
|
||||||
|
{ "name": "randomString", "pattern": "^\\s*randomString\\s*\\(" },
|
||||||
|
{ "name": "replace", "pattern": "^\\s*replace\\s*\\(" },
|
||||||
|
{ "name": "getRegex", "pattern": "^\\s*\\w+\\s*=\\s*getRegex\\s*\\(|^\\s*getRegex\\s*\\(" },
|
||||||
|
{ "name": "getDateTime", "pattern": "^\\s*\\w+\\s*=\\s*getDateTime\\s*\\(|^\\s*getDateTime\\s*\\(" },
|
||||||
|
{ "name": "getTimeStamp", "pattern": "^\\s*\\w+\\s*=\\s*getTimeStamp\\s*\\(|^\\s*getTimeStamp\\s*\\(" },
|
||||||
|
{ "name": "stampToDatetime", "pattern": "^\\s*\\w+\\s*=\\s*stampToDatetime\\s*\\(|^\\s*stampToDatetime\\s*\\(" },
|
||||||
|
{ "name": "async_command", "pattern": "^\\s*\\w+\\s*=\\s*go\\s+\\w+\\s*\\(|^\\s*gather\\s*\\(" },
|
||||||
|
{ "name": "connector", "pattern": "^\\s*\\w+\\s*=\\s*avapConnector\\s*\\(" },
|
||||||
|
{ "name": "return", "pattern": "^\\s*return\\s+\\S" },
|
||||||
|
{ "name": "modularity", "pattern": "^\\s*(import|include)\\s+" },
|
||||||
|
{ "name": "assignment", "pattern": "^\\s*\\w+\\s*=\\s*" }
|
||||||
|
],
|
||||||
|
|
||||||
|
"semantic_tags": [
|
||||||
|
{ "tag": "uses_orm", "pattern": "\\b(ormDirect|ormCheckTable|ormCreateTable|ormAccessSelect|ormAccessInsert|ormAccessUpdate)\\s*\\(" },
|
||||||
|
{ "tag": "uses_http", "pattern": "\\b(RequestPost|RequestGet)\\s*\\(" },
|
||||||
|
{ "tag": "uses_connector", "pattern": "\\bavapConnector\\s*\\(" },
|
||||||
|
{ "tag": "uses_async", "pattern": "\\bgo\\s+\\w+\\s*\\(|\\bgather\\s*\\(" },
|
||||||
|
{ "tag": "uses_crypto", "pattern": "\\b(encodeSHA256|encodeMD5)\\s*\\(" },
|
||||||
|
{ "tag": "uses_auth", "pattern": "\\b(addParam|_status)\\b" },
|
||||||
|
{ "tag": "uses_error_handling", "pattern": "\\btry\\s*\\(\\s*\\)" },
|
||||||
|
{ "tag": "uses_exception", "pattern": "\\bexception\\s*\\(" },
|
||||||
|
{ "tag": "uses_loop", "pattern": "\\bstartLoop\\s*\\(" },
|
||||||
|
{ "tag": "uses_conditional", "pattern": "\\bif\\s*\\(" },
|
||||||
|
{ "tag": "uses_json", "pattern": "\\b(variableFromJSON|AddvariableToJSON|AddVariableToJSON)\\s*\\(" },
|
||||||
|
{ "tag": "uses_list", "pattern": "\\b(variableToList|itemFromList|getListLen)\\s*\\(" },
|
||||||
|
{ "tag": "uses_regex", "pattern": "\\bgetRegex\\s*\\(" },
|
||||||
|
{ "tag": "uses_datetime", "pattern": "\\b(getDateTime|getTimeStamp|stampToDatetime)\\s*\\(" },
|
||||||
|
{ "tag": "uses_string_ops", "pattern": "\\b(randomString|replace|encodeSHA256|encodeMD5)\\s*\\(" },
|
||||||
|
{ "tag": "uses_return", "pattern": "^\\s*return\\s+\\S" },
|
||||||
|
{ "tag": "returns_result", "pattern": "\\baddResult\\s*\\(" },
|
||||||
|
{ "tag": "registers_endpoint", "pattern": "\\bregisterEndpoint\\s*\\(" }
|
||||||
|
]
|
||||||
|
}
|
||||||
Loading…
Reference in New Issue