3.2 KiB

Raw Blame History

ADR-0003: Hybrid Retrieval (BM25 + kNN) with RRF Fusion

Date: 2026-03-05 Status: Accepted Deciders: Rafael Ruiz (CTO)

Context

The RAG pipeline needs a retrieval strategy for finding relevant AVAP documentation chunks from Elasticsearch. The knowledge base contains a mix of:

Prose documentation (explanations of AVAP concepts, commands, parameters) — benefits from semantic (dense) retrieval
Code examples and BNF grammar (exact syntax patterns, function signatures) — benefits from lexical (sparse) retrieval, where exact token matches are critical

A single retrieval strategy will underperform for one of these document types.

Decision

Implement hybrid retrieval combining:

BM25 (Elasticsearch multi_match on content^2 and text^2 fields) for lexical relevance
kNN (Elasticsearch knn on the embedding field) for semantic relevance
RRF (Reciprocal Rank Fusion) with constant k=60 to fuse rankings from both systems

The fused top-8 documents are passed to the generation node as context.

Query reformulation (reformulate node) runs before retrieval and rewrites the user query into keyword-optimized form to improve BM25 recall for AVAP-specific terminology.

Rationale

Why hybrid over pure semantic?

AVAP is a domain-specific language with precise, non-negotiable syntax. For queries like "how does addVar work", exact lexical matching on the function name addVar is more reliable than semantic similarity, which may confuse similar-sounding functions or return contextually related but syntactically different commands.

Why hybrid over pure BM25?

Conversational queries ("explain how loops work in AVAP", "what's the difference between addVar and setVar") benefit from semantic search that captures meaning beyond exact keyword overlap.

Why RRF over score normalization?

BM25 and kNN scores are on different scales and distributions. Normalizing them requires careful calibration per index. RRF operates on ranks — not scores — making it robust to distribution differences and requiring no per-deployment tuning. The k=60 constant is the standard literature value.

Retrieval parameters

Parameter	Value	Rationale
`k` (top documents)	8	Balances context richness vs. context window length
`num_candidates` (kNN)	`k × 5 = 40`	Standard ES kNN oversampling ratio
BM25 fields	`content^2, text^2`	Boost content/text fields; `^2` emphasizes them over metadata
Fuzziness (BM25)	`AUTO`	Handles minor typos in AVAP function names

Consequences

Retrieval requires two ES queries per request (BM25 + kNN). This is acceptable given the tunnel latency baseline already incurred.
If either BM25 or kNN fails (e.g., embedding model unavailable), the system degrades gracefully: the failing component logs a warning and returns an empty list; RRF fusion proceeds with the available rankings.
Context length grows with k. At k=8 with typical chunk sizes (~300 tokens each), context is ~2400 tokens — within the qwen2.5:1.5b context window.
Changing k has a direct impact on both retrieval quality and generation latency. Any change must be evaluated with EvaluateRAG before merging.

3.2 KiB Raw Blame History Unescape Escape