ADR0008 Finished and ADR0009 finalized

This commit is contained in:
rafa-ruiz 2026-04-13 20:37:43 -07:00
parent 6118a77078
commit 8db95e50f4
2 changed files with 60 additions and 11 deletions

View File

@ -3,6 +3,26 @@
All notable changes to the **Brunix Assistance Engine** will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
---
## [1.7.1] - 2026-04-13
### Added
- ENGINE: `_run_generate_tests(user_request, generated_code, llm)` — module-level helper in `graph.py` that encapsulates the full test generation logic (LLM call, thinking stripping, JSON extraction, 15s timeout). Both the `generate_tests` LangGraph node and `AskAgentStream` delegate to it.
- ENGINE: `AskAgentStream` — test generation now runs before `_call_parser` in the streaming state machine. After a code block is buffered, `_run_generate_tests` is called to produce `test_inputs`/`test_list`; both are passed to `_call_parser` (first attempt and retry). `AskAgentStream` now has full execution + assertion validation parity with `AskAgent`.
- ENGINE: `_call_parser` — assertion KeyError guard. When the parser returns `{'success': False, 'error': "'variable_name'"}` (Python `KeyError` from evaluating a non-existent variable in an assertion), the engine drops the offending assertion and retries instead of invoking the LLM code-fix path. Prevents correct code from being incorrectly rewritten due to a test generation defect.
### Changed
- ENGINE: `generate_tests` LangGraph node refactored to delegate to `_run_generate_tests` (no behaviour change).
- ENGINE: `TEST_GENERATION_PROMPT` — strengthened `<avap_variables_rule>`: explicit rule that `addResult(x)` sends `x` to the HTTP response and does **not** create a variable; assertions must never reference names that only appear inside `addResult()`. Updated example to reinforce the correct pattern.
### Fixed
- ENGINE: `AskAgentStream` validation was execution-only with null inputs — `id_interno` resolved to `None` because no `test_inputs` were injected. Fixed by running `_run_generate_tests` before each `_call_parser` call in the streaming path.
- ENGINE: `generate_tests` was producing assertions on `str(result)` when code used `addResult(id_interno)` — `result` is not a variable, causing parser `KeyError` and incorrect LLM code-fix retry. Fixed by prompt improvement and the KeyError guard.
### Docs
- `docs/ADR/ADR-0009` updated: streaming state machine flow now shows `_run_generate_tests` step; difference table updated with test generation and assertion validation rows; `generate_tests` node section documents variable naming contract and KeyError guard; `AskAgentStream` graph section notes full parity.
---
## [1.7.0] - 2026-04-13
### Added

View File

@ -1,7 +1,7 @@
# ADR-0009: Per-Type Response Validation Layer
**Date:** 2026-04-10
**Last updated:** 2026-04-13
**Last updated:** 2026-04-13 (rev 2)
**Status:** Implemented
**Deciders:** Rafael Ruiz (CTO)
**Related ADRs:** ADR-0007 (MSVL for RAG Evaluation), ADR-0008 (Adaptive Query Routing), ADR-0003 (Hybrid Retrieval RRF)
@ -16,7 +16,7 @@ ADR-0007 documents that 1016% of `CODE_GENERATION` responses contain syntacti
This problem exists in production today. A user receiving a `CODE_GENERATION` response has no indication that the generated code would fail on the PLATON kernel.
The AVAP Parser gRPC service — established in ADR-0007 as a hard dependency of the evaluation pipeline — is already available in the stack. It returns not just `VALID / INVALID` but a **complete line-by-line execution trace** on failure:
The AVAP Parser REST HTTP service (Tornado, port 8888) — established in ADR-0007 as a hard dependency of the evaluation pipeline — is already available in the stack. It returns not just `VALID / INVALID` but a **complete line-by-line execution trace** on failure:
```
Line 3: unknown command 'getSHA256' — expected known identifier
@ -113,14 +113,33 @@ generated_code: "addParam(\"client_id\", id_interno)\naddResult(id_interno)"
# Output
{
"test_inputs": {"client_id": "12345"},
"test_list": ["re.match(r'^\\d{5}$', str(id_interno))"]
"test_list": ["re.match(r'^\\d+$', str(id_interno))"]
}
```
`test_inputs` are injected as request variables when the parser executes the code. `test_list` items are regex assertions evaluated against the output variables after execution.
**Variable naming contract** (enforced by `TEST_GENERATION_PROMPT`):
- `addParam("request_param_name", avap_var)``test_inputs` key is `"request_param_name"`; assertion variable is `avap_var`.
- `addResult(x)` — adds `x` to the HTTP response body. **Does NOT create a variable.** Assertions must never reference a name that only appears inside `addResult()`.
- Only variables on the left side of `=`, the second arg of `addParam`, or the first arg of `addVar` are valid assertion targets.
If the LLM call fails or times out (15s hard limit), `generate_tests` returns empty `test_inputs` and `test_list` — validation continues using execution-only (no assertions). This keeps the node non-blocking.
The core generation logic is implemented in `_run_generate_tests(user_request, generated_code, llm)` — a module-level function in `graph.py`. Both the `generate_tests` LangGraph node and the `AskAgentStream` streaming state machine delegate to it.
#### Assertion KeyError guard in `_call_parser`
If the parser returns `{'success': False, 'error': "'variable_name'"}` — a Python `KeyError` pattern — it means an assertion referenced a variable that does not exist in the execution scope (a test generation defect, not a code defect). In this case `_call_parser`:
1. Logs the event as `[ptvl] assertion KeyError: variable '...' not in scope — dropping affected assertions`.
2. Filters out any assertion in `test_list` that references `str(bad_var)`.
3. Retries `_call_parser` with the filtered list.
4. If all assertions were filtered out, returns `(True, "")` — the code itself executed successfully.
This prevents the LLM fix path from being triggered on a test generation error, which would incorrectly modify correct code.
#### Trace-guided retry
The parser error is injected into the retry prompt as a structured correction context:
@ -207,7 +226,7 @@ Code is extracted from the LLM response by scanning for the first markdown code
#### Parser availability — circuit breaker
A single timeout or connection error does not mean the parser is down. A sustained outage does. Hammering an unavailable gRPC service on every `CODE_GENERATION` request adds latency to every user request with zero benefit.
A single timeout or connection error does not mean the parser is down. A sustained outage does. Hammering an unavailable REST service on every `CODE_GENERATION` request adds latency to every user request with zero benefit.
The engine implements a **circuit breaker** with three states:
@ -224,7 +243,7 @@ HALF-OPEN ──[probe fails]───────────► OPEN
| Cooldown before half-open | 30 seconds | `PARSER_CB_COOLDOWN` |
| Timeout per call | 2 seconds | `AVAP_PARSER_TIMEOUT` |
While the circuit is **OPEN**, `CODE_GENERATION` responses are returned immediately with `validation_status = PARSER_UNAVAILABLE` — no gRPC call is attempted. The cooldown prevents thundering-herd reconnection attempts.
While the circuit is **OPEN**, `CODE_GENERATION` responses are returned immediately with `validation_status = PARSER_UNAVAILABLE` — no HTTP call is attempted. The cooldown prevents thundering-herd reconnection attempts.
```
[ptvl] circuit OPEN — skipping parser, returning unvalidated
@ -291,8 +310,6 @@ Reformulate this query using broader terms or alternative phrasing.
---
---
### Decision 3 — AskAgentStream: streaming state machine for CODE_GENERATION
The `AskAgentStream` path streams tokens directly to the client. Post-generation validation (Decision 1) cannot run here because tokens are already yielded before the response is complete.
@ -315,7 +332,14 @@ STATE: CODE (buffering — nothing yielded to client)
→ detect closing ``` after first newline
│ closing ``` detected
_call_parser(complete_block)
_extract_avap_code(complete_block)
_run_generate_tests(query, avap_code, llm) ← same helper as build_graph path
[produces test_inputs + test_list, 15s timeout]
_call_parser(complete_block, test_inputs, test_list)
├── VALID ──────────────────────────────► yield code block → back to TEXT
@ -326,7 +350,7 @@ _call_parser(complete_block)
│ "Fix this AVAP code: {trace}"
│ │
│ ▼
│ _call_parser(fixed_block)
│ _call_parser(fixed_block, test_inputs, test_list) ← same tests reused
│ ├── VALID ──────────────────────► yield fixed block → back to TEXT
│ ├── INVALID ────────────────────► yield fixed block + INVALID_UNRESOLVED → back to TEXT
│ └── UNAVAILABLE ────────────────► yield fixed block + PARSER_UNAVAILABLE → back to TEXT
@ -337,7 +361,7 @@ _call_parser(complete_block)
#### Key properties
- Text before and after the code block streams to the client without delay.
- Only the code block itself introduces latency (one parser call, optionally one LLM fix call + one parser call).
- Only the code block itself introduces latency (one `_run_generate_tests` LLM call + one parser call, optionally one LLM fix call + one additional parser call).
- The fix call asks the LLM to correct **only the code block**, not the full response — the text already streamed to the client remains valid.
- If the response contains multiple code blocks, each block is processed independently in sequence. `validation_status` in the final `is_final=True` message reflects the last block validated.
- If the stream ends while still in CODE mode (malformed response without closing fence), the buffer is flushed as-is.
@ -347,7 +371,10 @@ _call_parser(complete_block)
| Property | AskAgent | AskAgentStream |
|---|---|---|
| Validation point | Post-generation, pre-delivery | Inline, at code block boundary |
| Test generation | `generate_tests` LangGraph node | `_run_generate_tests()` called directly before `_call_parser` |
| Execution + assertion validation | `validate_code` node → `/api/v1/execute` with `test_inputs`/`test_list` | `_call_parser()` with `test_inputs`/`test_list` |
| Retry mechanism | Full `generate_code_retry` LangGraph node | Targeted LLM fix call (code block only) |
| Retry reuses same tests | Yes (state carries `test_inputs`/`test_list`) | Yes (`_ti`/`_tl` captured before first call) |
| Text streaming | N/A (non-streaming) | Uninterrupted |
| `validation_status` delivery | In `AgentResponse` (only response) | In final `AgentResponse` (`is_final=True`) |
@ -371,6 +398,8 @@ _call_parser(complete_block)
No new LangGraph nodes. The validation logic runs as a **streaming state machine** inline in `server.py:AskAgentStream`. See Decision 3.
Test generation and execution-level validation now have full parity with the `AskAgent` path: `_run_generate_tests` is called after each code block is buffered, and `_call_parser` receives the resulting `test_inputs`/`test_list`. Both paths share the same module-level `_run_generate_tests` implementation.
### Updated flow — `build_graph`
```mermaid
@ -442,7 +471,7 @@ message AgentResponse {
}
```
**Implementation status: complete (2026-04-12).** Field 4 added to `brunix.proto`. Populated in both `AskAgent` (from `final_state`) and `AskAgentStream` (in the `is_final=True` message at end of stream).
**Implementation status: complete (2026-04-13).** Field 4 added to `brunix.proto`. Populated in both `AskAgent` (from `final_state`) and `AskAgentStream` (in the `is_final=True` message at end of stream). Full test generation + assertion validation parity between both paths achieved 2026-04-13.
Clients that do not read `validation_status` are unaffected — the field defaults to empty string.