ADR0008 Finished and ADR0009 finalized
This commit is contained in:
parent
6118a77078
commit
8db95e50f4
20
changelog
20
changelog
|
|
@ -3,6 +3,26 @@
|
|||
All notable changes to the **Brunix Assistance Engine** will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
---
|
||||
## [1.7.1] - 2026-04-13
|
||||
|
||||
### Added
|
||||
- ENGINE: `_run_generate_tests(user_request, generated_code, llm)` — module-level helper in `graph.py` that encapsulates the full test generation logic (LLM call, thinking stripping, JSON extraction, 15s timeout). Both the `generate_tests` LangGraph node and `AskAgentStream` delegate to it.
|
||||
- ENGINE: `AskAgentStream` — test generation now runs before `_call_parser` in the streaming state machine. After a code block is buffered, `_run_generate_tests` is called to produce `test_inputs`/`test_list`; both are passed to `_call_parser` (first attempt and retry). `AskAgentStream` now has full execution + assertion validation parity with `AskAgent`.
|
||||
- ENGINE: `_call_parser` — assertion KeyError guard. When the parser returns `{'success': False, 'error': "'variable_name'"}` (Python `KeyError` from evaluating a non-existent variable in an assertion), the engine drops the offending assertion and retries instead of invoking the LLM code-fix path. Prevents correct code from being incorrectly rewritten due to a test generation defect.
|
||||
|
||||
### Changed
|
||||
- ENGINE: `generate_tests` LangGraph node refactored to delegate to `_run_generate_tests` (no behaviour change).
|
||||
- ENGINE: `TEST_GENERATION_PROMPT` — strengthened `<avap_variables_rule>`: explicit rule that `addResult(x)` sends `x` to the HTTP response and does **not** create a variable; assertions must never reference names that only appear inside `addResult()`. Updated example to reinforce the correct pattern.
|
||||
|
||||
### Fixed
|
||||
- ENGINE: `AskAgentStream` validation was execution-only with null inputs — `id_interno` resolved to `None` because no `test_inputs` were injected. Fixed by running `_run_generate_tests` before each `_call_parser` call in the streaming path.
|
||||
- ENGINE: `generate_tests` was producing assertions on `str(result)` when code used `addResult(id_interno)` — `result` is not a variable, causing parser `KeyError` and incorrect LLM code-fix retry. Fixed by prompt improvement and the KeyError guard.
|
||||
|
||||
### Docs
|
||||
- `docs/ADR/ADR-0009` updated: streaming state machine flow now shows `_run_generate_tests` step; difference table updated with test generation and assertion validation rows; `generate_tests` node section documents variable naming contract and KeyError guard; `AskAgentStream` graph section notes full parity.
|
||||
|
||||
---
|
||||
|
||||
## [1.7.0] - 2026-04-13
|
||||
|
||||
### Added
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
# ADR-0009: Per-Type Response Validation Layer
|
||||
|
||||
**Date:** 2026-04-10
|
||||
**Last updated:** 2026-04-13
|
||||
**Last updated:** 2026-04-13 (rev 2)
|
||||
**Status:** Implemented
|
||||
**Deciders:** Rafael Ruiz (CTO)
|
||||
**Related ADRs:** ADR-0007 (MSVL for RAG Evaluation), ADR-0008 (Adaptive Query Routing), ADR-0003 (Hybrid Retrieval RRF)
|
||||
|
|
@ -16,7 +16,7 @@ ADR-0007 documents that 10–16% of `CODE_GENERATION` responses contain syntacti
|
|||
|
||||
This problem exists in production today. A user receiving a `CODE_GENERATION` response has no indication that the generated code would fail on the PLATON kernel.
|
||||
|
||||
The AVAP Parser gRPC service — established in ADR-0007 as a hard dependency of the evaluation pipeline — is already available in the stack. It returns not just `VALID / INVALID` but a **complete line-by-line execution trace** on failure:
|
||||
The AVAP Parser REST HTTP service (Tornado, port 8888) — established in ADR-0007 as a hard dependency of the evaluation pipeline — is already available in the stack. It returns not just `VALID / INVALID` but a **complete line-by-line execution trace** on failure:
|
||||
|
||||
```
|
||||
Line 3: unknown command 'getSHA256' — expected known identifier
|
||||
|
|
@ -113,14 +113,33 @@ generated_code: "addParam(\"client_id\", id_interno)\naddResult(id_interno)"
|
|||
# Output
|
||||
{
|
||||
"test_inputs": {"client_id": "12345"},
|
||||
"test_list": ["re.match(r'^\\d{5}$', str(id_interno))"]
|
||||
"test_list": ["re.match(r'^\\d+$', str(id_interno))"]
|
||||
}
|
||||
```
|
||||
|
||||
`test_inputs` are injected as request variables when the parser executes the code. `test_list` items are regex assertions evaluated against the output variables after execution.
|
||||
|
||||
**Variable naming contract** (enforced by `TEST_GENERATION_PROMPT`):
|
||||
|
||||
- `addParam("request_param_name", avap_var)` — `test_inputs` key is `"request_param_name"`; assertion variable is `avap_var`.
|
||||
- `addResult(x)` — adds `x` to the HTTP response body. **Does NOT create a variable.** Assertions must never reference a name that only appears inside `addResult()`.
|
||||
- Only variables on the left side of `=`, the second arg of `addParam`, or the first arg of `addVar` are valid assertion targets.
|
||||
|
||||
If the LLM call fails or times out (15s hard limit), `generate_tests` returns empty `test_inputs` and `test_list` — validation continues using execution-only (no assertions). This keeps the node non-blocking.
|
||||
|
||||
The core generation logic is implemented in `_run_generate_tests(user_request, generated_code, llm)` — a module-level function in `graph.py`. Both the `generate_tests` LangGraph node and the `AskAgentStream` streaming state machine delegate to it.
|
||||
|
||||
#### Assertion KeyError guard in `_call_parser`
|
||||
|
||||
If the parser returns `{'success': False, 'error': "'variable_name'"}` — a Python `KeyError` pattern — it means an assertion referenced a variable that does not exist in the execution scope (a test generation defect, not a code defect). In this case `_call_parser`:
|
||||
|
||||
1. Logs the event as `[ptvl] assertion KeyError: variable '...' not in scope — dropping affected assertions`.
|
||||
2. Filters out any assertion in `test_list` that references `str(bad_var)`.
|
||||
3. Retries `_call_parser` with the filtered list.
|
||||
4. If all assertions were filtered out, returns `(True, "")` — the code itself executed successfully.
|
||||
|
||||
This prevents the LLM fix path from being triggered on a test generation error, which would incorrectly modify correct code.
|
||||
|
||||
#### Trace-guided retry
|
||||
|
||||
The parser error is injected into the retry prompt as a structured correction context:
|
||||
|
|
@ -207,7 +226,7 @@ Code is extracted from the LLM response by scanning for the first markdown code
|
|||
|
||||
#### Parser availability — circuit breaker
|
||||
|
||||
A single timeout or connection error does not mean the parser is down. A sustained outage does. Hammering an unavailable gRPC service on every `CODE_GENERATION` request adds latency to every user request with zero benefit.
|
||||
A single timeout or connection error does not mean the parser is down. A sustained outage does. Hammering an unavailable REST service on every `CODE_GENERATION` request adds latency to every user request with zero benefit.
|
||||
|
||||
The engine implements a **circuit breaker** with three states:
|
||||
|
||||
|
|
@ -224,7 +243,7 @@ HALF-OPEN ──[probe fails]───────────► OPEN
|
|||
| Cooldown before half-open | 30 seconds | `PARSER_CB_COOLDOWN` |
|
||||
| Timeout per call | 2 seconds | `AVAP_PARSER_TIMEOUT` |
|
||||
|
||||
While the circuit is **OPEN**, `CODE_GENERATION` responses are returned immediately with `validation_status = PARSER_UNAVAILABLE` — no gRPC call is attempted. The cooldown prevents thundering-herd reconnection attempts.
|
||||
While the circuit is **OPEN**, `CODE_GENERATION` responses are returned immediately with `validation_status = PARSER_UNAVAILABLE` — no HTTP call is attempted. The cooldown prevents thundering-herd reconnection attempts.
|
||||
|
||||
```
|
||||
[ptvl] circuit OPEN — skipping parser, returning unvalidated
|
||||
|
|
@ -291,8 +310,6 @@ Reformulate this query using broader terms or alternative phrasing.
|
|||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
### Decision 3 — AskAgentStream: streaming state machine for CODE_GENERATION
|
||||
|
||||
The `AskAgentStream` path streams tokens directly to the client. Post-generation validation (Decision 1) cannot run here because tokens are already yielded before the response is complete.
|
||||
|
|
@ -315,7 +332,14 @@ STATE: CODE (buffering — nothing yielded to client)
|
|||
→ detect closing ``` after first newline
|
||||
│ closing ``` detected
|
||||
▼
|
||||
_call_parser(complete_block)
|
||||
_extract_avap_code(complete_block)
|
||||
│
|
||||
▼
|
||||
_run_generate_tests(query, avap_code, llm) ← same helper as build_graph path
|
||||
[produces test_inputs + test_list, 15s timeout]
|
||||
│
|
||||
▼
|
||||
_call_parser(complete_block, test_inputs, test_list)
|
||||
│
|
||||
├── VALID ──────────────────────────────► yield code block → back to TEXT
|
||||
│
|
||||
|
|
@ -326,7 +350,7 @@ _call_parser(complete_block)
|
|||
│ "Fix this AVAP code: {trace}"
|
||||
│ │
|
||||
│ ▼
|
||||
│ _call_parser(fixed_block)
|
||||
│ _call_parser(fixed_block, test_inputs, test_list) ← same tests reused
|
||||
│ ├── VALID ──────────────────────► yield fixed block → back to TEXT
|
||||
│ ├── INVALID ────────────────────► yield fixed block + INVALID_UNRESOLVED → back to TEXT
|
||||
│ └── UNAVAILABLE ────────────────► yield fixed block + PARSER_UNAVAILABLE → back to TEXT
|
||||
|
|
@ -337,7 +361,7 @@ _call_parser(complete_block)
|
|||
#### Key properties
|
||||
|
||||
- Text before and after the code block streams to the client without delay.
|
||||
- Only the code block itself introduces latency (one parser call, optionally one LLM fix call + one parser call).
|
||||
- Only the code block itself introduces latency (one `_run_generate_tests` LLM call + one parser call, optionally one LLM fix call + one additional parser call).
|
||||
- The fix call asks the LLM to correct **only the code block**, not the full response — the text already streamed to the client remains valid.
|
||||
- If the response contains multiple code blocks, each block is processed independently in sequence. `validation_status` in the final `is_final=True` message reflects the last block validated.
|
||||
- If the stream ends while still in CODE mode (malformed response without closing fence), the buffer is flushed as-is.
|
||||
|
|
@ -347,7 +371,10 @@ _call_parser(complete_block)
|
|||
| Property | AskAgent | AskAgentStream |
|
||||
|---|---|---|
|
||||
| Validation point | Post-generation, pre-delivery | Inline, at code block boundary |
|
||||
| Test generation | `generate_tests` LangGraph node | `_run_generate_tests()` called directly before `_call_parser` |
|
||||
| Execution + assertion validation | `validate_code` node → `/api/v1/execute` with `test_inputs`/`test_list` | `_call_parser()` with `test_inputs`/`test_list` |
|
||||
| Retry mechanism | Full `generate_code_retry` LangGraph node | Targeted LLM fix call (code block only) |
|
||||
| Retry reuses same tests | Yes (state carries `test_inputs`/`test_list`) | Yes (`_ti`/`_tl` captured before first call) |
|
||||
| Text streaming | N/A (non-streaming) | Uninterrupted |
|
||||
| `validation_status` delivery | In `AgentResponse` (only response) | In final `AgentResponse` (`is_final=True`) |
|
||||
|
||||
|
|
@ -371,6 +398,8 @@ _call_parser(complete_block)
|
|||
|
||||
No new LangGraph nodes. The validation logic runs as a **streaming state machine** inline in `server.py:AskAgentStream`. See Decision 3.
|
||||
|
||||
Test generation and execution-level validation now have full parity with the `AskAgent` path: `_run_generate_tests` is called after each code block is buffered, and `_call_parser` receives the resulting `test_inputs`/`test_list`. Both paths share the same module-level `_run_generate_tests` implementation.
|
||||
|
||||
### Updated flow — `build_graph`
|
||||
|
||||
```mermaid
|
||||
|
|
@ -442,7 +471,7 @@ message AgentResponse {
|
|||
}
|
||||
```
|
||||
|
||||
**Implementation status: complete (2026-04-12).** Field 4 added to `brunix.proto`. Populated in both `AskAgent` (from `final_state`) and `AskAgentStream` (in the `is_final=True` message at end of stream).
|
||||
**Implementation status: complete (2026-04-13).** Field 4 added to `brunix.proto`. Populated in both `AskAgent` (from `final_state`) and `AskAgentStream` (in the `is_final=True` message at end of stream). Full test generation + assertion validation parity between both paths achieved 2026-04-13.
|
||||
|
||||
Clients that do not read `validation_status` are unaffected — the field defaults to empty string.
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue