3.2 KiB
ADR-0003: Hybrid Retrieval (BM25 + kNN) with RRF Fusion
Date: 2026-03-05 Status: Accepted Deciders: Rafael Ruiz (CTO)
Context
The RAG pipeline needs a retrieval strategy for finding relevant AVAP documentation chunks from Elasticsearch. The knowledge base contains a mix of:
- Prose documentation (explanations of AVAP concepts, commands, parameters) — benefits from semantic (dense) retrieval
- Code examples and BNF grammar (exact syntax patterns, function signatures) — benefits from lexical (sparse) retrieval, where exact token matches are critical
A single retrieval strategy will underperform for one of these document types.
Decision
Implement hybrid retrieval combining:
- BM25 (Elasticsearch
multi_matchoncontent^2andtext^2fields) for lexical relevance - kNN (Elasticsearch
knnon theembeddingfield) for semantic relevance - RRF (Reciprocal Rank Fusion) with constant
k=60to fuse rankings from both systems
The fused top-8 documents are passed to the generation node as context.
Query reformulation (reformulate node) runs before retrieval and rewrites the user query into keyword-optimized form to improve BM25 recall for AVAP-specific terminology.
Rationale
Why hybrid over pure semantic?
AVAP is a domain-specific language with precise, non-negotiable syntax. For queries like "how does addVar work", exact lexical matching on the function name addVar is more reliable than semantic similarity, which may confuse similar-sounding functions or return contextually related but syntactically different commands.
Why hybrid over pure BM25?
Conversational queries ("explain how loops work in AVAP", "what's the difference between addVar and setVar") benefit from semantic search that captures meaning beyond exact keyword overlap.
Why RRF over score normalization?
BM25 and kNN scores are on different scales and distributions. Normalizing them requires careful calibration per index. RRF operates on ranks — not scores — making it robust to distribution differences and requiring no per-deployment tuning. The k=60 constant is the standard literature value.
Retrieval parameters
| Parameter | Value | Rationale |
|---|---|---|
k (top documents) |
8 | Balances context richness vs. context window length |
num_candidates (kNN) |
k × 5 = 40 |
Standard ES kNN oversampling ratio |
| BM25 fields | content^2, text^2 |
Boost content/text fields; ^2 emphasizes them over metadata |
| Fuzziness (BM25) | AUTO |
Handles minor typos in AVAP function names |
Consequences
- Retrieval requires two ES queries per request (BM25 + kNN). This is acceptable given the tunnel latency baseline already incurred.
- If either BM25 or kNN fails (e.g., embedding model unavailable), the system degrades gracefully: the failing component logs a warning and returns an empty list; RRF fusion proceeds with the available rankings.
- Context length grows with
k. Atk=8with typical chunk sizes (~300 tokens each), context is ~2400 tokens — within theqwen2.5:1.5bcontext window. - Changing
khas a direct impact on both retrieval quality and generation latency. Any change must be evaluated withEvaluateRAGbefore merging.