3.7 KiB

Raw Blame History

ADR-0006: Code Indexing Improvements — Comparative Evaluation of code chunking strategies

Date: 2026-03-24
Status: Proposed Deciders: Rafael Ruiz (CTO), MrHouston Engineering

Context

Efficient code indexing is a critical component for enabling high-quality code search, retrieval-augmented generation (RAG), and semantic understanding in developer tooling. The main challenge lies in representing source code in a way that preserves its syntactic and semantic structure while remaining suitable for embedding-based retrieval systems.

In this context, we explored different strategies to improve the indexing of .avap code files, starting from a naïve approach and progressively moving toward more structured representations based on parsing techniques.

Alternatives

File-level chunking (baseline):

Each .avap file is treated as a single chunk and indexed directly. This approach is simple and fast but ignores internal structure (functions, classes, blocks).
EBNF chunking as metadata:

Each .avap file is still treated as a single chunk and indexed directly. However, by using the AVAP EBNF syntax, we extract the AST structure and injects it into the chunk metadata.
Full EBNF chunking:

Each .avap file is still treated as a single chunk and indexed directly. The difference between this approach and the last 2, is that the AST is indexed instead the code.
Grammar definition chunking:

Code is segmented using a language-specific configuration (avap_config.json) instead of one-file chunks. The chunker applies a lexer (comments/strings), identifies multi-line blocks (function, if, startLoop, try), classifies single-line statements (registerEndpoint, orm_command, http_command, etc.), and enriches every chunk with semantic tags (uses_orm, uses_http, uses_async, returns_result, among others).

This strategy also extracts function signatures as dedicated lightweight chunks and propagates local context between nearby chunks (semantic overlap), improving retrieval precision for both API-level and implementation-level queries.

Indexed docs

For each strategy, we created a different Elasticsearch Index with their own characteristics. The 3 first approaches have 33 chunks (1 chunk per file), whereas the last approach has 89 chunks.

How can we evaluate each strategy?

Evaluation Protocol:

Golden Dataset
- Generate a set of natural language queries paired with their ground-truth context (filename).
- Each query should be answerable by examining one or more code samples.
- Example: Query="How do you handle errors in AVAP?" → Context="try_catch_request.avap"
Test Each Strategy
- For each of the 4 chunking strategies, run the same set of queries against the respective Elasticsearch index.
- Record the top-10 retrieved chunks for each query.
Metrics
- NDCG@10: Normalized discounted cumulative gain at rank 10 (measures ranking quality).
- Recall@10: Fraction of relevant chunks retrieved in top 10.
- MRR@10: Mean reciprocal rank (position of first relevant result).
Relevance Judgment
- A chunk is considered relevant if it contains code directly answering the query.
- For file-level strategies: entire file is relevant or irrelevant.
- For grammar-definition: specific block/statement chunks are relevant even if the full file is not.
Acceptance Criteria
- Grammar definition must achieve at least a 10% improvement in NDCG@10 over file-level baseline.
- Recall@10 must not drop by more than 5 absolute percentage points vs file-level.
- Index size increase must remain below 50% of baseline.

3.7 KiB

Raw Blame History

ADR-0006: Code Indexing Improvements — Comparative Evaluation of code chunking strategies

Context

Alternatives

Indexed docs

How can we evaluate each strategy?

Decision

Rationale

Consequences

3.7 KiB Raw Blame History

ADR-0006: Code Indexing Improvements — Comparative Evaluation of code chunking strategies

Context

Alternatives

Indexed docs

How can we evaluate each strategy?

Decision

Rationale

Consequences

3.7 KiB

Raw Blame History