64 lines
3.2 KiB
Markdown
64 lines
3.2 KiB
Markdown
# ADR-0003: Hybrid Retrieval (BM25 + kNN) with RRF Fusion
|
||
|
||
**Date:** 2026-03-05
|
||
**Status:** Accepted
|
||
**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
|
||
|
||
---
|
||
|
||
## Context
|
||
|
||
The RAG pipeline needs a retrieval strategy for finding relevant AVAP documentation chunks from Elasticsearch. The knowledge base contains a mix of:
|
||
|
||
- **Prose documentation** (explanations of AVAP concepts, commands, parameters) — benefits from semantic (dense) retrieval
|
||
- **Code examples and BNF grammar** (exact syntax patterns, function signatures) — benefits from lexical (sparse) retrieval, where exact token matches are critical
|
||
|
||
A single retrieval strategy will underperform for one of these document types.
|
||
|
||
---
|
||
|
||
## Decision
|
||
|
||
Implement **hybrid retrieval** combining:
|
||
- **BM25** (Elasticsearch `multi_match` on `content^2` and `text^2` fields) for lexical relevance
|
||
- **kNN** (Elasticsearch `knn` on the `embedding` field) for semantic relevance
|
||
- **RRF (Reciprocal Rank Fusion)** with constant `k=60` to fuse rankings from both systems
|
||
|
||
The fused top-8 documents are passed to the generation node as context.
|
||
|
||
Query reformulation (`reformulate` node) runs before retrieval and rewrites the user query into keyword-optimized form to improve BM25 recall for AVAP-specific terminology.
|
||
|
||
---
|
||
|
||
## Rationale
|
||
|
||
### Why hybrid over pure semantic?
|
||
|
||
AVAP is a domain-specific language with precise, non-negotiable syntax. For queries like "how does `addVar` work", exact lexical matching on the function name `addVar` is more reliable than semantic similarity, which may confuse similar-sounding functions or return contextually related but syntactically different commands.
|
||
|
||
### Why hybrid over pure BM25?
|
||
|
||
Conversational queries ("explain how loops work in AVAP", "what's the difference between addVar and setVar") benefit from semantic search that captures meaning beyond exact keyword overlap.
|
||
|
||
### Why RRF over score normalization?
|
||
|
||
BM25 and kNN scores are on different scales and distributions. Normalizing them requires careful calibration per index. RRF operates on ranks — not scores — making it robust to distribution differences and requiring no per-deployment tuning. The `k=60` constant is the standard literature value.
|
||
|
||
### Retrieval parameters
|
||
|
||
| Parameter | Value | Rationale |
|
||
|---|---|---|
|
||
| `k` (top documents) | 8 | Balances context richness vs. context window length |
|
||
| `num_candidates` (kNN) | `k × 5 = 40` | Standard ES kNN oversampling ratio |
|
||
| BM25 fields | `content^2, text^2` | Boost content/text fields; `^2` emphasizes them over metadata |
|
||
| Fuzziness (BM25) | `AUTO` | Handles minor typos in AVAP function names |
|
||
|
||
---
|
||
|
||
## Consequences
|
||
|
||
- Retrieval requires two ES queries per request (BM25 + kNN). This is acceptable given the tunnel latency baseline already incurred.
|
||
- If either BM25 or kNN fails (e.g., embedding model unavailable), the system degrades gracefully: the failing component logs a warning and returns an empty list; RRF fusion proceeds with the available rankings.
|
||
- Context length grows with `k`. At `k=8` with typical chunk sizes (~300 tokens each), context is ~2400 tokens — within the `qwen2.5:1.5b` context window.
|
||
- Changing `k` has a direct impact on both retrieval quality and generation latency. Any change must be evaluated with `EvaluateRAG` before merging.
|