assistance-engine/docs/product/PRD-0002-editor-context-inj...

11 KiB
Raw Permalink Blame History

PRD-0002: Editor Context Injection for VS Code Extension

Date: 2026-03-19 Status: Implemented Requested by: Rafael Ruiz (CTO) Purpose: Validate the VS Code extension with real users Related ADR: ADR-0001 (gRPC interface), ADR-0002 (two-phase streaming)


Problem

The Brunix Assistance Engine previously received only two inputs from the client: a query (the user's question) and a session_id (for conversation continuity). It had no awareness of what the user was looking at in their editor when they asked the question.

This created a fundamental limitation for a coding assistant: the user asking "how do I handle the error here?" or "what does this function return?" could not be answered correctly without knowing what "here" and "this function" referred to. The assistant was forced to treat every question as a general AVAP documentation query, even when the user's intent was clearly anchored to specific code in their editor.

For the VS Code extension validation, the CEO needed to demonstrate that the assistant behaves as a genuine coding assistant — one that understands the user's current context — not just a documentation search tool.


Solution

The gRPC contract has been extended to allow the VS Code extension to send four optional context fields alongside every query. These fields are transported in the standard OpenAI user field as a JSON string when using the HTTP proxy, and as dedicated proto fields when calling gRPC directly.

Transport format via HTTP proxy (/v1/chat/completions):

{
  "model": "brunix",
  "messages": [{"role": "user", "content": "que hace este código?"}],
  "stream": true,
  "session_id": "uuid",
  "user": "{\"editor_content\":\"<base64>\",\"selected_text\":\"<base64>\",\"extra_context\":\"<base64>\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
}

Fields:

  • editor_content (base64) — full content of the active file open in the editor. Gives the assistant awareness of the complete code the user is working on.
  • selected_text (base64) — text currently selected in the editor, if any. The most precise signal of user intent — if the user has selected a block of code before asking a question, that block is almost certainly what the question is about.
  • extra_context (base64) — free-form additional context (e.g., file path, language identifier, cursor position, open diagnostic errors). Extensible without requiring proto changes.
  • user_info (JSON object) — client identity metadata: dev_id, project_id, org_id. Not base64 — sent as a JSON object nested within the user JSON string.

All four fields are optional. If none are provided, the assistant behaves exactly as it does today — full backward compatibility.


User experience

Scenario 1 — Question about selected code: The user selects a try() / exception() / end() block in their editor and asks "why is this not catching my error?". The assistant detects via the classifier that the question refers explicitly to the selected code, injects selected_text into the generation prompt, and answers specifically about that block — not about error handling in general.

Scenario 2 — Question about the open file: The user has a full AVAP function open and asks "what HTTP status codes can this return?". The classifier detects the question refers to editor content, injects editor_content into the generation prompt, and reasons about the _status assignments in the function.

Scenario 3 — General question (unchanged behaviour): The user asks "how does addVar work?" without selecting anything or referring to the editor. The classifier sets use_editor_context: False. The assistant behaves exactly as before — retrieval-augmented response from the AVAP knowledge base, no editor content injected.


Scope

In scope:

  • Add editor_content, selected_text, extra_context, user_info fields to AgentRequest in brunix.proto
  • Decode base64 fields (editor_content, selected_text, extra_context) in server.py before propagating to graph state
  • Parse user_info as opaque JSON string — available in state for future use, not yet consumed by the graph
  • Parse the user field in openai_proxy.py as a JSON object containing all four context fields
  • Propagate all fields through the server into the graph state (AgentState)
  • Extend the classifier (CLASSIFY_PROMPT_TEMPLATE) to output two tokens: query type and editor context signal (EDITOR / NO_EDITOR)
  • Set use_editor_context: bool in AgentState based on classifier output
  • Use selected_text as the primary anchor for query reformulation only when use_editor_context is True
  • Inject selected_text and editor_content into the generation prompt only when use_editor_context is True
  • Fix reformulator language — queries must be rewritten in the original language, never translated

Out of scope:

  • Changes to EvaluateRAG — the golden dataset does not include editor-context queries; this feature does not affect embedding or retrieval evaluation
  • Consuming user_info fields (dev_id, project_id, org_id) in the graph — available in state for future routing or personalisation
  • Evaluation of the feature impact via EvaluateRAG — a dedicated golden dataset with editor-context queries is required for that measurement; it is future work

Technical design

Proto changes (brunix.proto)

message AgentRequest {
  string query          = 1;  // unchanged
  string session_id     = 2;  // unchanged
  string editor_content = 3;  // base64-encoded full editor file content
  string selected_text  = 4;  // base64-encoded currently selected text
  string extra_context  = 5;  // base64-encoded free-form additional context
  string user_info      = 6;  // JSON string: {"dev_id":…,"project_id":…,"org_id":…}
}

Fields 1 and 2 are unchanged. Fields 36 are optional — absent fields default to empty string in proto3. All existing clients remain compatible without modification.

AgentState changes (state.py)

class AgentState(TypedDict):
    # Core fields
    messages:           Annotated[list, add_messages]
    session_id:         str
    query_type:         str
    reformulated_query: str
    context:            str
    # Editor context fields (PRD-0002)
    editor_content:     str   # decoded from base64
    selected_text:      str   # decoded from base64
    extra_context:      str   # decoded from base64
    user_info:          str   # JSON string — {"dev_id":…,"project_id":…,"org_id":…}
    # Set by classifier — True only when user explicitly refers to editor code
    use_editor_context: bool

Server changes (server.py)

Base64 decoding applied to editor_content, selected_text and extra_context before propagation. user_info passed as-is (plain JSON string). Helper function:

def _decode_b64(value: str) -> str:
    try:
        return base64.b64decode(value).decode("utf-8") if value else ""
    except Exception:
        logger.warning(f"[base64] decode failed")
        return ""

Proxy changes (openai_proxy.py)

The user field is parsed as a JSON object. _parse_editor_context extracts all four fields:

def _parse_editor_context(user: Optional[str]) -> tuple[str, str, str, str]:
    if not user:
        return "", "", "", ""
    try:
        ctx = json.loads(user)
        if isinstance(ctx, dict):
            return (
                ctx.get("editor_content", "") or "",
                ctx.get("selected_text",  "") or "",
                ctx.get("extra_context",  "") or "",
                json.dumps(ctx.get("user_info", {})),
            )
    except (json.JSONDecodeError, TypeError):
        pass
    return "", "", "", ""

session_id is now read exclusively from the dedicated session_id field — no longer falls back to user.

Classifier changes (prompts.py + graph.py)

CLASSIFY_PROMPT_TEMPLATE now outputs two tokens separated by a space:

  • First token: RETRIEVAL, CODE_GENERATION, or CONVERSATIONAL
  • Second token: EDITOR or NO_EDITOR

EDITOR is set only when the user message explicitly refers to the editor code or selected text using expressions like "this code", "este codigo", "fix this", "que hace esto", "explain this", etc.

_parse_query_type returns tuple[str, bool]. Both classify nodes (in build_graph and build_prepare_graph) set use_editor_context in the state.

Reformulator changes (prompts.py + graph.py)

Two fixes applied:

Mode-aware reformulation: The reformulator receives [MODE: X] prepended to the query. In RETRIEVAL mode it compresses the query without expanding AVAP commands. In CODE_GENERATION mode it applies the command mapping. In CONVERSATIONAL mode it returns the query as-is.

Language preservation: The reformulator never translates. Queries in Spanish are rewritten in Spanish. Queries in English are rewritten in English. This fix was required because the BM25 retrieval is lexical — a Spanish chunk ("AVAP es un DSL...") cannot be found by an English query ("AVAP stand for").

Generator changes (graph.py)

_build_generation_prompt injects editor_content and selected_text into the prompt only when use_editor_context is True. Priority hierarchy when injected:

  1. selected_text — highest priority, most specific signal
  2. editor_content — file-level context
  3. RAG-retrieved chunks — knowledge base context
  4. extra_context — free-form additional context

Validation

Acceptance criteria:

  • A query explicitly referring to selected code (selected_text non-empty, classifier returns EDITOR) produces a response grounded in that specific code.
  • A general query (use_editor_context: False) produces a response identical in quality to the pre-PRD-0002 system — no editor content injected, no regression.
  • A query in Spanish retrieves Spanish chunks correctly — the reformulator preserves the language.
  • Existing gRPC clients that do not send the new fields work without modification.
  • The user field in the HTTP proxy can be a plain string or absent — no error raised.

Future measurement: Once the extension is validated and the embedding model is selected (ADR-0005), a dedicated golden dataset of editor-context queries should be built and added to EvaluateRAG to measure the quantitative impact of this feature.


Impact on parallel workstreams

Embedding evaluation (ADR-0005 / MrHouston): No impact. The BEIR benchmarks and EvaluateRAG runs for embedding model selection use the existing golden dataset, which contains no editor-context queries. The two workstreams are independent.

RAG architecture evolution: This feature is additive. It does not change the retrieval infrastructure, the Elasticsearch index, or the embedding pipeline. It extends the graph with additional input signals that improve response quality for editor-anchored queries.