Compare commits

...

202 Commits

Author SHA1 Message Date
acano e035472b14 created chunks for new emb model (harrier) 2026-04-06 12:45:49 +02:00
pseco 56349184fb Add evaluation results for AVAP knowledge models and update evaluation notebook
- Created a new JSON file containing evaluation results for the AVAP knowledge models, including scores for faithfulness, answer relevancy, context recall, and context precision.
- Updated the evaluation notebook to use a new embedding model and fixed execution counts for code cells.
2026-04-06 11:55:33 +02:00
pseco 00a7cc727d Refactor code structure and remove redundant code blocks for improved readability and maintainability 2026-04-06 11:20:21 +02:00
pseco 26ffcc54d9 Refactor code structure for improved readability and maintainability 2026-03-31 13:57:15 +02:00
pseco 4f7367d2d4 Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-31 11:21:21 +02:00
pseco 8f501d3e52 Refactor code structure for improved readability and maintainability 2026-03-31 11:16:03 +02:00
rafa-ruiz 1e9a6508f9 Golden dataset 2026-03-31 01:48:00 -07:00
rafa-ruiz aa138783f3 Golden dataset 2026-03-31 01:40:53 -07:00
rafa-ruiz 6ee8583894 update 2026-03-31 01:40:23 -07:00
pseco cd656b08a8 Update default dataset path in validate_synthetic_dataset.py to point to new output location 2026-03-30 10:04:28 +02:00
pseco 04fa15ff1e Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-27 14:14:05 +01:00
pseco 0cf2fc3aa7 Remove detailed print statements from fill rate analysis and retain only essential output 2026-03-27 14:14:00 +01:00
acano 8df0b59f65 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-27 14:13:17 +01:00
acano e4f76f3fab Add newline at the end of generate_mbap_v2.py for better file formatting 2026-03-27 14:10:58 +01:00
pseco 344230c2cf Refactor code structure for improved readability and maintainability 2026-03-27 14:09:18 +01:00
acano d074ce32cc Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-27 14:08:18 +01:00
acano bae58a7fed Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-27 14:04:31 +01:00
acano f747c140c8 Enhance generate_mbap_v2.py with new reward mechanism and GoldPool integration
- Added GoldPool class to manage a top-K pool of high-reward examples.
- Implemented compute_reward function to calculate composite rewards based on execution coverage, novelty, and test quality.
- Introduced call_api_reward function for API calls in the new reward mode.
- Updated main function to support new reward mode with adjustable weights for ECS, novelty, and test quality.
- Enhanced dataset saving functionality to include reward statistics.
- Refactored existing code for improved readability and consistency.
2026-03-27 14:04:21 +01:00
pseco d2d223baea Add new JSON dataset for email validation API task and initialize validated dataset 2026-03-27 11:21:41 +01:00
Rafael Ruiz 3e47c15966
Merge pull request #63 from BRUNIX-AI/mrh-online-dev-partial
Add BEIR analysis notebooks and evaluation pipeline for embedding models
2026-03-26 09:33:54 -07:00
pseco 668f6d006b Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-26 17:18:49 +01:00
pseco febf955a62 Add new JSON output files for candidate F reward statistics and MBPP tasks
- Created `candidate_F_reward_10_coverage_stats.json` with coverage statistics including total cells, filled cells, fill rate, and node type frequency.
- Added `mbpp_avap.json` containing 14 tasks with descriptions, code implementations, test inputs, and expected test results for various endpoints and functionalities.
2026-03-26 17:18:45 +01:00
acano c6b57849cd Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-26 17:02:27 +01:00
acano b94f3382b3 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-26 17:00:34 +01:00
izapata 4deda83a8e Add BEIR analysis notebooks and evaluation pipeline for embedding models
- Created `n00 Beir Analysis_cosqa.ipynb` for analyzing CoSQA dataset with BEIR.
- Created `n00 first Analysis.ipynb` for initial analysis using Ragas and Ollama embeddings.
- Implemented `evaluate_embeddings_pipeline.py` to evaluate embedding models across CodexGlue, CoSQA, and SciFact benchmarks.
- Added adapters for Ollama and HuggingFace embeddings to ensure compatibility with BEIR.
- Included functions to load datasets and evaluate models with detailed metrics.
2026-03-26 16:53:20 +01:00
Rafael Ruiz a55d4bbf5e
Merge pull request #62 from BRUNIX-AI/mrh-online-dev-partial
Update Embedding model PDF and enhance documentation
2026-03-26 08:36:53 -07:00
rafa-ruiz fe43cd6fa9 scripts documentation 2026-03-26 07:51:01 -07:00
acano ba03aa3b92 Add ADR-0006: Code Indexing Improvements with evaluation strategies and alternatives 2026-03-26 15:41:12 +01:00
izapata 08c5aded35 fix(docs): improve formatting and readability in ADR-0005 for embedding model selection 2026-03-26 15:32:23 +01:00
acano 1f0d31b7b3 Delete obsolete Jupyter notebooks for BEIR analysis and first analysis, removing unused code and dependencies. 2026-03-26 15:20:44 +01:00
acano 591a839c2a Refactor code structure for improved readability and maintainability 2026-03-26 15:13:54 +01:00
acano 3d3237aef6 chore(changelog): update changelog for version 1.6.2 with embedding model selection PDF changes 2026-03-26 15:11:07 +01:00
izapata 76250a347b feat(docs): typo fix 2026-03-26 10:30:12 +01:00
izapata 64d487e20d chore: update changelog for version 1.6.2 and enhance README.md documentation 2026-03-26 10:25:34 +01:00
izapata e4a8e5b85d chore: update Embedding model selection PDF with new content 2026-03-26 10:19:24 +01:00
izapata 669d0b47a0 chore(embeddings): update Embedding model selection PDF 2026-03-26 10:15:49 +01:00
acano d50f33c707 Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-26 09:37:57 +01:00
pseco 1ee5f21c7c Add BEIR analysis notebooks and evaluation pipeline for embedding models
- Created `n00 Beir Analysis_cosqa.ipynb` for analyzing CoSQA dataset with BEIR.
- Created `n00 first Analysis.ipynb` for initial analysis with embeddings.
- Implemented `evaluate_embeddings_pipeline.py` to evaluate embedding models across CodexGlue, CoSQA, and SciFact benchmarks.
- Added adapters for Ollama and HuggingFace embeddings to ensure compatibility with BEIR.
- Enhanced error handling and data normalization in embedding processes.
- Included functionality to load datasets from local cache or download if not present.
2026-03-26 09:37:37 +01:00
rafa-ruiz ccd9073a52 feat(dataset): add ADR-0006 and scaffold reward algorithm pipeline 2026-03-25 22:19:19 -07:00
pseco 0d2cdd2190 Refactor AVAP dataset generation prompts and add synthetic data generation notebook
- Introduced a new notebook for generating synthetic datasets for AVAP, including loading AVAP and MBPP data, and creating prompts for LLM interactions.
2026-03-25 17:07:00 +01:00
pseco d7f895804c Refactor code structure for improved readability and maintainability 2026-03-25 10:53:38 +01:00
pseco 71eb85cc89 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-25 10:50:13 +01:00
pseco 0b309bfa69 feat: add evaluation results for bge-m3 and qwen3-0.6B-emb models 2026-03-25 10:46:02 +01:00
acano b2e5d06d96 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-25 10:41:26 +01:00
acano 21bc6fc3f0 feat: add embedding evaluation results and task processing notebook 2026-03-25 10:40:49 +01:00
acano da483c51bb created code_indexing_improvements research 2026-03-25 10:37:53 +01:00
acano fe90548b8b added ast tree metadata 2026-03-25 10:36:18 +01:00
acano dc8230c872 feat: add ANTHROPIC_API_KEY and ANTHROPIC_MODEL to docker-compose environment 2026-03-25 10:30:00 +01:00
acano bd542bb14d Continued ADR-0005 and created ADR-0006 2026-03-25 10:27:41 +01:00
acano 1442a632c9 fixed avap examples (not coherent with official avap bnf rules) 2026-03-25 10:26:47 +01:00
pseco cbce3ae530 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-25 10:06:11 +01:00
pseco 9b1a0e54d5 updated pipeline to only download files when missing 2026-03-25 10:06:07 +01:00
acano 017a89322a feat: update dependencies in pyproject.toml and uv.lock 2026-03-25 10:01:17 +01:00
Rafael Ruiz f9b2b014bb
Merge pull request #59 from BRUNIX-AI/mrh-online-dev-partial
Added embeddings research
2026-03-24 06:38:59 -07:00
pseco 2a33f8eb06 bge-m3 and qwen3-emb comparison 2026-03-23 15:59:44 +01:00
pseco b574517340 working on ADR0005 2026-03-23 13:17:50 +01:00
pseco 185ea276b7 updated cosqa notebook 2026-03-23 10:36:10 +01:00
acano ec57e52dea Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-23 09:12:20 +01:00
rafa-ruiz 59c1748594 feat: editor context injection (PRD-0002) + repository governance 2026-03-20 19:43:48 -07:00
rafa-ruiz 2fbfad41df feat: editor context injection (PRD-0002) + repository governance 2026-03-20 19:25:29 -07:00
acano 14b279b8af Refactor code structure for improved readability and maintainability 2026-03-19 16:40:57 +01:00
acano 27cfbaf257 Refactor code structure for improved readability and maintainability 2026-03-19 16:37:42 +01:00
acano 16acdfb1f3 DOCS: Add research directory description in README 2026-03-19 16:30:44 +01:00
acano b6f9b386ef Add BEIR analysis notebooks for different datasets and models
- Created `n00 Beir Analysis.ipynb` for analyzing BEIR dataset with Ollama embeddings.
- Added `n00 Beir Analysis_cosqa.ipynb` for evaluating the CosQA dataset using similar methods.
- Introduced `n00 first Analysis.ipynb` for initial analysis with Ragas embeddings and semantic similarity evaluation.
- Implemented data loading, processing, and evaluation metrics for each notebook.
- Included functionality to save results to JSON files for further analysis.
2026-03-19 16:27:25 +01:00
acano dd3bde2ec9 Add BEIR analysis notebooks for different datasets and models
- Created `n00 Beir Analysis.ipynb` for analyzing BEIR dataset with Ollama embeddings.
- Added `n00 Beir Analysis_cosqa.ipynb` for evaluating the CosQA dataset using similar embedding techniques.
- Introduced `n00 first Analysis.ipynb` for initial analysis with Ragas embeddings and semantic similarity evaluation.
- Implemented data loading and processing for each notebook, including downloading datasets and saving results.
- Included evaluation metrics such as NDCG, MAP, Recall, and Precision for model performance assessment.
2026-03-19 16:24:34 +01:00
acano 752bf9c7d9 Update Elasticsearch index version and modify imports in ingestion and translation scripts
- Changed Elasticsearch index from "avap-docs-test-v3" to "avap-docs-test-v4" in elasticsearch_ingestion.py.
- Removed unused import SystemMessage from langchain_core.messages in translate_mbpp.py.
- Added import for Lark in chunk.py to support new functionality.
2026-03-19 11:30:00 +01:00
acano 868a17523a Merge online into mrh-online-dev 2026-03-19 11:25:36 +01:00
Rafael Ruiz 3ca8fc450c
Merge pull request #58 from BRUNIX-AI/online-fork
Online fork
2026-03-18 19:13:30 -07:00
pseco 8878ca51e4 working on examples verification and testing on avap language server 2026-03-17 11:46:25 +01:00
pseco 80cdbcc38e Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-17 11:02:18 +01:00
pseco c7adab24a6 working on synthetic dataset 2026-03-17 11:02:06 +01:00
acano f343e0027b Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-16 13:21:38 +01:00
acano fadf813494 Update Elasticsearch index version and enhance document processing
- Changed Elasticsearch index from "avap-docs-test-v3" to "avap-docs-test-v4" in elasticsearch_ingestion.py.
- Added Lark parser for AVAP code processing in chunk.py.
- Enhanced metadata extraction for processed documents, including AST for AVAP files.
- Improved error handling for AVAP code parsing.
2026-03-16 13:21:25 +01:00
acano ed466b123d feat: Add llama-cpp-python and tenacity to dependencies 2026-03-16 13:19:09 +01:00
pseco 8501988619 working on bnf 2026-03-16 09:57:36 +01:00
acano ab1022d8b6 feat: Implement ElasticHandshakeWithMetadata to preserve chunk metadata in Elasticsearch 2026-03-13 11:02:32 +01:00
pseco 8aa12bd8eb Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-13 11:02:02 +01:00
pseco 8762eddef0 working on lark 2026-03-13 11:01:56 +01:00
acano e744d9f0cd feat: Load environment variables and add elasticsearch_index to Settings class 2026-03-12 17:36:28 +01:00
acano da63a4075f Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-12 16:09:05 +01:00
acano b163b7cddc fix: Correct spelling of 'synthetic' in changelog and refactor project_root method in Settings class 2026-03-12 16:08:55 +01:00
pseco acc00adfaa Add AVAP execution and testing scripts
- Implemented parser for executing AVAP files within a Docker container (parser v1.py).
- Created a script to send AVAP code to a local server and handle responses (parser v2.py).
- Introduced a mock MBAP test harness to validate AVAP code against expected outputs (mbap_tester.py).
- Added transformation logic to convert AVAP code into Python-like syntax for testing purposes.
- Enhanced error handling and output formatting in the testing harness.
2026-03-12 15:56:36 +01:00
pseco ba4a1f1efc Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-12 15:56:27 +01:00
pseco d518610fee Add AVAP execution and testing scripts
- Implemented `parser v1.py` to run AVAP files in a Docker container using subprocess.
- Created `parser v2.py` to send AVAP code to a local server and handle JSON responses.
- Introduced `mbap_tester.py` as a heuristic mock executor for testing AVAP code against predefined test cases.
- Added functions for transforming AVAP code to Python and executing it in a controlled environment.
- Included error handling and summary reporting for test results in `mbap_tester.py`.
2026-03-12 15:56:22 +01:00
acano c101a8f8da refactor: Update Settings class to use Optional types and streamline path resolution 2026-03-12 15:54:38 +01:00
acano 70a191ac37 chore: Remove AVAP architectural documentation from samples 2026-03-12 13:20:08 +01:00
acano 3aca659b3c Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-12 13:14:40 +01:00
acano 79905c22f4 docs: Update project structure in README and remove ingestion folder from changelog 2026-03-12 12:30:18 +01:00
acano ed25f15542 feat: Enhance Elasticsearch ingestion process with metadata export
- Added output path parameter to elasticsearch_ingestion command for exporting processed documents.
- Implemented ElasticHandshakeWithMetadata class to preserve chunk metadata during ingestion.
- Updated process_documents function to include extra metadata for each chunk.
- Modified ingest_documents function to return Elasticsearch response for each chunk.
- Introduced export_documents function to save processed documents as JSON files.
2026-03-12 12:26:47 +01:00
acano 4a81ec00a2 Update README and changelog for environment variable documentation and version date correction 2026-03-12 10:37:44 +01:00
acano 9425db9b6c Refactor project structure: move prompts module to tasks directory and update references 2026-03-12 10:20:34 +01:00
acano 46a6344c45 Add docstrings to elasticsearch_ingestion and ingest_documents functions for improved documentation 2026-03-12 09:53:56 +01:00
acano 189e404d21 Refactor Elasticsearch ingestion and document processing functions for improved clarity and functionality 2026-03-12 09:50:30 +01:00
acano 9f3564ab2a chore: Remove Kubernetes configuration directory from project structure 2026-03-11 17:41:31 +01:00
acano 0421a315eb Set default value of delete_es_index to False in elasticsearch_ingestion function 2026-03-11 17:39:25 +01:00
acano 5f21544e0b Refactor Elasticsearch ingestion pipeline and add MBPP generation script
- Updated `elasticsearch_ingestion.py` to streamline document processing and ingestion into Elasticsearch.
- Introduced `generate_mbap.py` for generating benchmark problems in AVAP language from a provided LRM.
- Created `prompts.py` to define prompts for converting Python problems to AVAP.
- Enhanced chunk processing in `chunk.py` to support markdown and AVAP documents.
- Added `OllamaEmbeddings` class in `embeddings.py` for handling embeddings with Ollama model.
- Updated dependencies in `uv.lock` to include new packages and versions.
2026-03-11 17:17:44 +01:00
acano 3caed4deb6 chore: Remove unused imports from config.py 2026-03-11 12:30:07 +01:00
acano 8ca81d98a8 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-11 12:29:52 +01:00
acano 5e5f14f5fa feat: Refactor path properties in Settings class to use proj_root 2026-03-11 12:29:38 +01:00
pseco d04c149e66 workin on scratches bnf and parsing 2026-03-11 12:28:35 +01:00
pseco 3ac432567b BNF extraction pipeline from avap.md 2026-03-11 11:29:19 +01:00
pseco cd3922abbd modified config.py 2026-03-11 10:41:28 +01:00
acano 0ed7dfc653 Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-11 09:57:14 +01:00
acano 2ad09cc77f feat: Update dependencies and enhance Elasticsearch ingestion pipeline
- Added new dependencies including chonkie and markdown-it-py to requirements.txt.
- Refactored the Elasticsearch ingestion script to read and concatenate documents from specified folders.
- Implemented semantic chunking for documents using the chonkie library.
- Removed the old elasticsearch_ingestion_from_docs.py script as its functionality has been integrated into the main ingestion pipeline.
- Updated README.md to reflect new project structure and environment variables.
- Added a new changelog entry for version 1.4.0 detailing recent changes and enhancements.
2026-03-11 09:50:51 +01:00
pseco f5b2df94d2 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-10 14:40:32 +01:00
pseco 4c56dc29c4 Add initial Jupyter notebook for document ingestion using Ollama embeddings
- Implemented code to utilize OllamaEmbeddings for embedding documents.
- Included example usage with sample text inputs.
- Demonstrated response handling from the Ollama LLM.
- Noted deprecation warning for the Ollama class in LangChain.
2026-03-10 14:40:27 +01:00
acano 745ce07805 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-10 14:36:17 +01:00
acano bf3c7f36d8 feat(chunk): enhance file reading and processing logic
- Updated `read_files` function to return a list of dictionaries containing 'content' and 'title' keys.
- Added logic to handle concatenation of file contents and improved handling of file prefixes.
- Introduced `get_chunk_docs` function to chunk document contents using `SemanticChunker`.
- Added `convert_chunks_to_document` function to convert chunked content into `Document` objects.
- Integrated logging for chunking process.
- Updated dependencies in `uv.lock` to include `chonkie` and other related packages.
2026-03-10 14:36:09 +01:00
pseco a9bf84fa79 feat: Add synthetic dataset generation for AVAP using MBPP dataset
- Implemented a new script `translate_mbpp.py` to generate synthetic datasets using various LLM providers.
- Integrated the `get_prompt_mbpp` function in `prompts.py` to create prompts tailored for AVAP language conversion.
2026-03-09 17:43:07 +01:00
pseco f6bfba5561 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-09 15:04:23 +01:00
pseco 4afba7d89d working on scrappy 2026-03-09 15:00:07 +01:00
acano 6d856ba691 Add chunk.py for processing and replacing JavaScript references with Avap
- Implemented `replace_javascript_with_avap` function to handle text replacement.
- Created `read_concat_files` function to read and concatenate files with a specified prefix, replacing JavaScript markers.
- Added functionality to read files from a specified directory and process their contents.
2026-03-09 13:21:18 +01:00
izapata c8da317dd8 chore: remove unused dependencies from pyproject.toml and uv.lock 2026-03-09 11:08:42 +01:00
pseco 423061f76d Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-09 09:35:53 +01:00
pseco 11e6ef71b1 working on agent pseco scratches 2026-03-09 09:35:49 +01:00
acano a434d34676 Updated docs 2026-03-06 11:38:06 +01:00
acano 6692990a38 docs: Update function definition syntax in AVAP™ documentation for clarity 2026-03-06 08:50:54 +01:00
acano d951868200 refactor: Simplify Elasticsearch ingestion by removing chunk management module and integrating document building directly 2026-03-05 16:23:27 +01:00
acano 31206e8fce refactor: Update project structure in README to enhance clarity and organization 2026-03-05 15:16:50 +01:00
acano 7a883846c9 chore: Add .vscode to .gitignore to exclude VSCode settings from version control 2026-03-05 12:14:24 +01:00
acano 97c5ea7ce5 Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-05 12:13:31 +01:00
acano ec57c635f9 chore: Remove pull request template to streamline PR submissions 2026-03-05 12:07:14 +01:00
acano 83ca902a59 fix: Correct file path for chunk management module in changelog 2026-03-05 11:44:52 +01:00
acano 1166fc3bf4 feat: Add pull request template to standardize PR submissions 2026-03-05 11:40:38 +01:00
acano 51f42c52b3 refactor: Remove unused uuid import from chunks.py and update changelog for refactoring changes 2026-03-05 11:27:27 +01:00
pseco 34c13dceca chore: Update changelog for version 1.2.0 to include new factory modules, orchestration, and ingestion pipeline 2026-03-05 11:15:40 +01:00
pseco 010270bf22 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-05 11:00:35 +01:00
pseco 183c04829c Update changelog for version 1.2.0: add new modules, refactor server integration, and enhance dependency management 2026-03-05 11:00:30 +01:00
acano d9d754bc6f Implement feature X to enhance user experience and optimize performance 2026-03-05 10:57:38 +01:00
acano 1549069f5a feat: Add Elasticsearch ingestion pipeline and document chunking functionality
- Implemented `elasticsearch_ingestion` function to handle document ingestion into Elasticsearch.
- Created `build_chunks_from_folder` function to read and clean text files, generating document chunks.
- Added logging for better traceability during the ingestion process.
- Updated `uv.lock` to include `boto3` as a new dependency.
2026-03-04 18:21:01 +01:00
pseco f15266f345 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-04 13:58:43 +01:00
pseco 9079674114 working on retrieve from ES 2026-03-04 13:58:38 +01:00
acano dcc07495e5 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-03 17:49:36 +01:00
acano 0538f3b5ce Refactor code structure for improved readability and maintainability 2026-03-03 17:49:27 +01:00
pseco 89316a9f6b Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-03 15:07:58 +01:00
pseco 63c5fc976f working on Dual Index 2026-03-03 15:07:53 +01:00
acano ff08d9a426 Refactor code structure for improved readability and maintainability 2026-03-03 14:38:55 +01:00
acano 5e29469fb4 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-03 14:16:05 +01:00
acano bc87753f2d Implement embedding and chat model factories for multiple providers 2026-03-03 14:15:54 +01:00
pseco 9575af3ff0 working on dual index 2026-03-03 12:01:03 +01:00
acano 203ba4a45c Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-03 10:30:01 +01:00
acano 45d124e017 Remove deprecated configuration files and update Docker Compose for Ollama service 2026-03-03 10:29:52 +01:00
pseco c2e43c030a Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-03 09:41:27 +01:00
pseco 8297ae204c added lark to notebook 2026-03-03 09:39:09 +01:00
acano a7c40d4f2c Refactor Docker Compose and Update Dependencies
- Removed network_mode: "host" from docker-compose.yaml for better isolation.
- Updated execution counts in langgraph_agent_simple.ipynb to reflect new cell order.
- Added OLLAMA_LOCAL_URL to imports in langgraph_agent_simple.ipynb.
- Included base_url parameter for create_chat_model and create_embedding_model functions in langgraph_agent_simple.ipynb.
- Added litellm>=1.82.0 to the development dependencies in pyproject.toml.
- Updated uv.lock to include litellm and its dependencies, along with fastuuid package.
2026-03-02 16:13:07 +01:00
acano 5a666079a4 Refactor langgraph_agent_simple notebook execution counts and handle Langfuse client errors
- Set execution counts to null for initial cells in langgraph_agent_simple.ipynb
- Update execution counts for subsequent cells to maintain order
- Change output stream name from stdout to stderr for error handling
- Capture and log detailed error messages for failed Langfuse client authentication

Update uv.lock to manage accelerate dependency

- Remove accelerate from main dependencies
- Add accelerate to dev dependencies with version specification
- Adjust requires-dist section to reflect changes in dependency management
2026-03-02 14:07:29 +01:00
acano 5b424f8409 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-02 12:48:45 +01:00
pseco 3e4f1cd51e update requirements 2026-03-02 12:47:52 +01:00
acano 93fd8457a1 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-02 12:46:12 +01:00
acano a43ba0b600 Fix execution count and streamline function call in langgraph_agent_simple notebook 2026-03-02 12:45:57 +01:00
pseco cb6ce9d210 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-02 12:41:36 +01:00
pseco a5952c1a4d working on agent in docker 2026-03-02 12:41:27 +01:00
acano cdc90c0b43 Refactor code structure for improved readability and maintainability 2026-03-02 12:40:49 +01:00
acano f6cfdb9df7 Implement feature X to enhance user experience and optimize performance 2026-03-02 12:24:08 +01:00
acano 48d280440c Refactor code structure for improved readability and maintainability 2026-02-27 14:45:33 +01:00
acano 10246a3046 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-02-27 08:41:10 +01:00
acano a70cd29b67 Refactor code structure for improved readability and maintainability 2026-02-27 08:39:39 +01:00
pseco 6480c77edb added markdown, langfuse downgraded, angchain-huggingface, accelerate 2026-02-27 08:36:20 +01:00
pseco e01e424fac workin on llm_factory 2026-02-26 18:02:46 +01:00
pseco 77751ee8ac working on langgraph agent v2 2026-02-26 11:35:00 +01:00
pseco 4a2db004c0 adding ragas to dev 2026-02-26 09:45:42 +01:00
pseco 006323c8b7 config changes 2026-02-25 17:28:46 +01:00
pseco dfdf94f604 working on ragas 2026-02-25 17:28:35 +01:00
pseco 12eef38f33 count tokens files 2026-02-25 17:17:20 +01:00
pseco 71cb79985c added config and count tokens 2026-02-25 14:59:57 +01:00
pseco b01a76e71d evaluation on acano 2026-02-24 17:03:33 +01:00
pseco f6a907911d unstage changes 2026-02-24 16:59:57 +01:00
pseco cdd5f45ae1 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-02-24 16:59:38 +01:00
pseco d6ac7aa1ca working on evaluating embeddings 2026-02-24 16:53:37 +01:00
acano 4b5352d93c Refactor code structure for improved readability and maintainability 2026-02-24 16:24:32 +01:00
acano d4d7d9d2a1 Implement feature X to enhance user experience and optimize performance 2026-02-24 15:48:06 +01:00
acano 0d6c08e341 Add BEIR analysis notebook for CosQA and update dependencies
- Created a new Jupyter notebook for analyzing BEIR dataset with CosQA using Ollama embeddings.
- Implemented a custom embedding class to integrate LangChain's OllamaEmbeddings with BEIR.
- Added data loading and evaluation logic for the CosQA dataset.
- Updated `uv.lock` to remove unnecessary dependencies (`mteb` and `polars`) and incremented revision number.
2026-02-24 15:27:59 +01:00
pseco ff438ea6c4 update makefile 2026-02-24 14:52:48 +01:00
pseco 9b6726c232 working on evaluatin embeddings 2026-02-24 14:35:48 +01:00
pseco 8e852c5417 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-02-24 12:09:56 +01:00
pseco 397dc7602b working on embeddings evaluation 2026-02-24 12:09:51 +01:00
acano a386982722 Add display data output for corpus conversion progress in Jupyter notebook 2026-02-24 11:46:49 +01:00
acano a098ad02cf Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-02-24 11:46:29 +01:00
acano cb16306ffb Refactor code structure for improved readability and maintainability 2026-02-24 11:46:18 +01:00
pseco ebd47961f9 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-02-24 10:51:08 +01:00
pseco bb56222013 working on ragas 2026-02-24 10:51:04 +01:00
acano 629cdc4c49 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-02-24 10:50:03 +01:00
acano 4a1236f951 Refactor code structure for improved readability and maintainability 2026-02-24 10:49:52 +01:00
pseco 88eab98aba first agent templates approach 2026-02-20 14:43:28 +01:00
pseco b46297c58f Add project coding standards and guidelines for Python (PEP 8) 2026-02-20 10:50:32 +01:00
acano b662b9a4fa Update langgraph_agent_simple notebook: Adjust execution counts and refine AVAP tool description
- Changed execution counts for several code cells to maintain proper order.
- Updated system message to specify the role of the agent in responding to AVAP-related queries.
- Modified user input example to inquire about reserved words in AVAP.
- Enhanced AI response to include detailed information about AVAP reserved words and provided a code example demonstrating their usage.
2026-02-20 10:05:38 +01:00
pseco 1c34771685 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-02-19 17:09:59 +01:00
acano 0c2d0b512d Implement feature X to enhance user experience and fix bug Y in module Z 2026-02-19 17:09:42 +01:00
pseco 0b75c3254c Refactor code structure for improved readability and maintainability 2026-02-19 17:09:35 +01:00
acano 02af67fffb created notebook for langgraph testing 2026-02-19 14:48:46 +01:00
pseco 4b0be0b80b feat: add retrieval functionality and update execution counts in notebooks 2026-02-19 12:45:36 +01:00
pseco 51488b3ee6 working on ingestion 2026-02-19 11:54:31 +01:00
pseco 1a77b84921 working on retrieve 2026-02-18 16:23:03 +01:00
pseco e4aa30e8c1 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-02-18 14:51:57 +01:00
pseco 26603a9f45 feat: add chunking methods and ingestion process for Elasticsearch 2026-02-18 14:51:52 +01:00
acano eef6d28db1 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-02-18 14:51:20 +01:00
acano ba2d2dbcaa Implement feature X to enhance user experience and fix bug Y in module Z 2026-02-18 14:51:04 +01:00
pseco f2482cae19 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-02-18 13:55:53 +01:00
acano 0dad6b1ef5 feat: add initial implementation of Elasticsearch ingestion with chunking strategies 2026-02-18 13:52:54 +01:00
pseco 2ec64b1472 working on ingestion 2026-02-18 13:52:01 +01:00
acano 133f95bcaf feat: add .dockerignore and update docker-compose to use .env file 2026-02-17 12:26:22 +01:00
acano 918befe65f Refactor code structure for improved readability and maintainability 2026-02-17 11:33:15 +01:00
242 changed files with 615988 additions and 92972 deletions

55
.github/CODEOWNERS vendored Normal file
View File

@ -0,0 +1,55 @@
# CODEOWNERS
#
# Ownership and review rules for the Brunix Assistance Engine repository.
#
# Teams:
# @BRUNIX-AI/engineering — Core engineering team. Owns the production
# codebase, infrastructure, gRPC contract, and all architectural decisions.
# Required reviewer on every pull request targeting `online`.
#
# @BRUNIX-AI/research — Scientific research team. Responsible for RAG
# evaluation, embedding model benchmarking, dataset generation, and
# experiment documentation. Write access to research/ and docs/product/.
# All changes to production code require review from engineering.
#
# This file is enforced by GitHub branch protection rules on `online`.
# See: Settings → Branches → online → Require review from Code Owners
# Default — every PR requires engineering approval
* @BRUNIX-AI/engineering @rafa-ruiz
# ── Production engine ────────────────────────────────────────────────────────
# gRPC contract — any change requires explicit CTO sign-off
Docker/protos/brunix.proto @BRUNIX-AI/engineering @rafa-ruiz
# Core engine — graph, server, prompts, state, evaluation
Docker/src/ @BRUNIX-AI/engineering @rafa-ruiz
# ── Ingestion & knowledge base ───────────────────────────────────────────────
# Ingestion pipelines
scripts/pipelines/ @BRUNIX-AI/engineering @rafa-ruiz
# Grammar config — any change requires a full index rebuild
scripts/pipelines/ingestion/avap_config.json @BRUNIX-AI/engineering @rafa-ruiz
# Golden dataset — any change requires a new EvaluateRAG baseline before merging
Docker/src/golden_dataset.json @BRUNIX-AI/engineering @rafa-ruiz
# ── Research ─────────────────────────────────────────────────────────────────
# Research folder — managed by the research team, no engineering approval needed
# for experiment documentation, benchmarks and datasets
research/ @BRUNIX-AI/research @BRUNIX-AI/engineering
# ── Governance & documentation ───────────────────────────────────────────────
# ADRs and PRDs — all decisions require CTO approval
docs/ADR/ @BRUNIX-AI/engineering @rafa-ruiz
docs/product/ @BRUNIX-AI/engineering @rafa-ruiz
# Governance documents
CONTRIBUTING.md @BRUNIX-AI/engineering @rafa-ruiz
SECURITY.md @BRUNIX-AI/engineering @rafa-ruiz
.github/ @BRUNIX-AI/engineering @rafa-ruiz

7
.github/agents/El listo.agent.md vendored Normal file
View File

@ -0,0 +1,7 @@
---
name: El_listo
description: Describe what this custom agent does and when to use it.
argument-hint: The inputs this agent expects, e.g., "a task to implement" or "a question to answer".
# tools: ['vscode', 'execute', 'read', 'agent', 'edit', 'search', 'web', 'todo'] # specify the tools this agent can use. If not set, all enabled tools are allowed.
---
Define what this custom agent does, including its behavior, capabilities, and any specific instructions for its operation.

71
.github/copilot-instructions.md vendored Normal file
View File

@ -0,0 +1,71 @@
---
applyTo: "**"
---
# Project General Coding Standards (Python - PEP 8)
## Naming Conventions
- Follow PEP 8 naming conventions strictly
- Use `snake_case` for variables, functions, and methods
- Use `PascalCase` for class names
- Use `UPPER_CASE` for constants
- Prefix internal/private attributes with a single underscore (`_`)
- Avoid single-letter variable names unless in very small scopes (e.g., loops)
---
## Code Style & Formatting
- Follow PEP 8 guidelines for formatting
- Use 4 spaces for indentation (never tabs)
- Limit lines to 79 characters when possible (max 88 if using Black)
- Add a blank line between logical sections of code
- Keep imports organized:
1. Standard library
2. Third-party packages
3. Local imports
- Avoid wildcard imports (`from module import *`)
- Remove unused imports and variables
---
## Clean Code Principles
- Write small, focused functions that do **one thing only**
- Prefer functions over long procedural blocks
- Keep functions under ~30 lines when possible
- Use descriptive and meaningful names
- Avoid code duplication (DRY principle)
- Prefer explicit code over implicit behavior
- Avoid deeply nested logic (max 23 levels)
- Use early returns to reduce nesting
- Keep classes focused and cohesive
- Write code that is easy to read and maintain
---
## Function Design
- Always use functions when logic can be reused or isolated
- Add type hints to all function signatures
- Keep functions pure when possible (avoid side effects)
- Document functions using docstrings (PEP 257 style)
Example:
```python
def calculate_average(values: list[float]) -> float:
"""
Calculate the average of a list of numbers.
Args:
values: A list of float numbers.
Returns:
The arithmetic mean of the values.
"""
if not values:
raise ValueError("The values list cannot be empty.")
return sum(values) / len(values)

115
Agents.md Normal file
View File

@ -0,0 +1,115 @@
# Agents.md
# Project Context — Python (General)
This repository is a general Python project. Ensure a virtual environment exists, create it with `uv` if missing, keep `requirements.txt` accurate, and then follow the user's instructions.
---
## Environment
- Virtual env: prefer `uv` to manage the venv and installs.
- Create: `uv venv` (optionally `--python <version>`).
- Use: activate `.venv` or run commands via `uv run <cmd>`.
- Dependencies:
- If `requirements.txt` exists: `uv pip install -r requirements.txt`.
- When packages are added/removed, update `requirements.txt` to stay accurate.
- Default policy: freeze exact versions unless the user specifies otherwise (e.g., `uv pip freeze > requirements.txt`).
- Never assume the environment is active.
- Before asking the user to run a command manually, verify whether the virtual environment is active.
- If not active, instruct explicitly how to activate it:
- Unix/macOS: `source .venv/bin/activate`
- Prefer `uv run <command>` to avoid activation issues.
---
## Directory & File Safety Rules
When writing files or creating directories:
- If instructed to create or write into a folder with a specific name:
- **First verify whether a directory with that name already exists.**
- Do NOT create it blindly.
- If it exists, reuse it unless the user explicitly requests overwriting.
- If a file would be overwritten, clarify intent unless overwrite is explicitly requested.
- Avoid duplicating folder structures.
- Do not create parallel directories with similar names (e.g., `src` and `src_new`) unless explicitly requested.
- Maintain consistent project structure conventions.
- Respect existing naming conventions.
---
## Command Execution Policy
When suggesting commands to the user:
1. Ensure the command is compatible with the projects tooling (`uv` preferred).
2. Avoid global installs.
3. Avoid OS-specific assumptions unless specified.
4. If the command modifies the environment (e.g., installing packages):
- Make sure `requirements.txt` is updated accordingly.
5. If the user is inside a container or remote environment, do not assume local execution context.
6. Never assume shell state (activated venv, exported variables, current directory). Always be explicit.
---
## Operating Model
- Ask quick clarifying questions if versions, entry points, or expected behaviors are ambiguous.
- Prefer small, incremental changes with brief plans for multi-step work.
- Do not commit secrets.
- Avoid committing large build artifacts unless requested.
- Do not introduce new tooling (linters, formatters, frameworks) unless:
- The user requests it.
- It is strictly necessary to complete the task.
- Avoid unnecessary architectural refactors.
- Prioritize minimal, reversible changes.
---
## Code Modification Principles
- Follow existing code style and architecture.
- Avoid unnecessary refactors.
- Keep changes minimal and scoped.
- Preserve backward compatibility unless explicitly told otherwise.
- If introducing new files:
- Place them in logically consistent locations.
- Avoid breaking import paths.
- Do not remove or rename files unless explicitly instructed.
---
## Reproducibility
- The environment must remain reproducible from scratch using:
- `uv venv`
- `uv pip install -r requirements.txt`
- No hidden dependencies.
- No reliance on undeclared system packages unless clearly documented.
- Ensure all runtime dependencies are declared.
---
## Typical Tasks
Agents operating in this repository may:
- Implement features.
- Fix bugs.
- Run tests.
- Update documentation.
- Improve structure when explicitly requested.
- Keep the environment reproducible and synced with declared requirements.
---
## Safety & Determinism Rules
- Do not perform destructive actions (e.g., delete directories, drop databases) unless explicitly instructed.
- Do not overwrite configuration files without confirmation.
- Always prefer deterministic operations.
- If uncertain about intent, ask before acting.
---

View File

@ -15,7 +15,9 @@
7. [Changelog Policy](#7-changelog-policy)
8. [Documentation Policy](#8-documentation-policy)
9. [Architecture Decision Records (ADRs)](#9-architecture-decision-records-adrs)
10. [Incident & Blockage Reporting](#10-incident--blockage-reporting)
10. [Product Requirements Documents (PRDs)](#10-product-requirements-documents-prds)
11. [Research & Experiments Policy](#11-research--experiments-policy)
12. [Incident & Blockage Reporting](#12-incident--blockage-reporting)
---
@ -92,14 +94,16 @@ A PR is not ready for review unless **all applicable items** in the following ch
- [ ] No new environment variables were introduced
- [ ] New environment variables are documented in the `.env` reference table in `README.md`
**Changelog** *(see [Section 6](#6-changelog-policy))*
**Changelog** *(see [Section 7](#7-changelog-policy))*
- [ ] No changelog entry required (internal refactor, comment/typo fix, zero behavioral change)
- [ ] Changelog updated with correct version bump and date
**Documentation** *(see [Section 8](#8-documentation-policy))*
- [ ] No documentation update required (internal change, no impact on setup or API)
- [ ] `README.md` or relevant docs updated to reflect this change
- [ ] If a significant architectural decision was made, an ADR was created in `docs/adr/`
- [ ] If a significant architectural decision was made, an ADR was created in `docs/ADR/`
- [ ] If a new user-facing feature was introduced, a PRD was created in `docs/product/`
- [ ] If an experiment was conducted, results were documented in `research/`
---
@ -170,10 +174,10 @@ The `changelog` file tracks all notable changes and follows [Semantic Versioning
### Format
New entries go at the top of the file, above the previous version:
New entries go under `[Unreleased]` at the top of the file. When a PR merges, `[Unreleased]` is renamed to the new version with its date:
```
## [X.Y.Z] - YYYY-MM-DD
## [Unreleased]
### Added
- LABEL: Description of the new feature or capability.
@ -185,7 +189,7 @@ New entries go at the top of the file, above the previous version:
- LABEL: Description of the bug resolved.
```
Use uppercase short labels for scanability: `API:`, `DOCKER:`, `INFRA:`, `SECURITY:`, `ENV:`, `CONFIG:`.
Use uppercase short labels for scanability: `ENGINE:`, `API:`, `PROTO:`, `DOCKER:`, `INFRA:`, `SECURITY:`, `ENV:`, `CONFIG:`, `DOCS:`, `FEATURE:`.
---
@ -219,7 +223,9 @@ Update `README.md` (or the relevant doc file) if the PR includes any of the foll
| `docs/API_REFERENCE.md` | Complete gRPC API contract and examples |
| `docs/RUNBOOK.md` | Operational playbooks and incident response |
| `docs/AVAP_CHUNKER_CONFIG.md` | `avap_config.json` reference — blocks, statements, semantic tags |
| `docs/adr/` | Architecture Decision Records |
| `docs/ADR/` | Architecture Decision Records |
| `docs/product/` | Product Requirements Documents |
| `research/` | Experiment results, benchmarks, datasets |
> **PRs that change user-facing behavior or setup without updating documentation will be rejected.**
@ -233,7 +239,7 @@ Architecture Decision Records document **significant technical decisions** — c
Write an ADR when a PR introduces or changes:
- A fundamental technology choice (communication protocol, storage backend, framework)
- A fundamental technology choice (communication protocol, storage backend, framework, model)
- A design pattern that other components will follow
- A deliberate trade-off with known consequences
- A decision that future engineers might otherwise reverse without understanding the rationale
@ -244,10 +250,11 @@ Write an ADR when a PR introduces or changes:
- Bug fixes
- Dependency version bumps
- Configuration changes
- New user-facing features (use a PRD instead)
### ADR format
ADRs live in `docs/adr/` and follow this naming convention:
ADRs live in `docs/ADR/` and follow this naming convention:
```
ADR-XXXX-short-title.md
@ -261,7 +268,7 @@ Each ADR must contain:
# ADR-XXXX: Title
**Date:** YYYY-MM-DD
**Status:** Proposed | Accepted | Deprecated | Superseded by ADR-YYYY
**Status:** Proposed | Under Evaluation | Accepted | Deprecated | Superseded by ADR-YYYY
**Deciders:** Names or roles
## Context
@ -281,14 +288,106 @@ What are the positive and negative results of this decision?
| ADR | Title | Status |
|---|---|---|
| [ADR-0001](docs/adr/ADR-0001-grpc-primary-interface.md) | gRPC as the Primary Communication Interface | Accepted |
| [ADR-0002](docs/adr/ADR-0002-two-phase-streaming.md) | Two-Phase Streaming Design for AskAgentStream | Accepted |
| [ADR-0003](docs/adr/ADR-0003-hybrid-retrieval-rrf.md) | Hybrid Retrieval (BM25 + kNN) with RRF Fusion | Accepted |
| [ADR-0004](docs/adr/ADR-0004-claude-eval-judge.md) | Claude as the RAGAS Evaluation Judge | Accepted |
| [ADR-0001](docs/ADR/ADR-0001-grpc-primary-interface.md) | gRPC as the Primary Communication Interface | Accepted |
| [ADR-0002](docs/ADR/ADR-0002-two-phase-streaming.md) | Two-Phase Streaming Design for AskAgentStream | Accepted |
| [ADR-0003](docs/ADR/ADR-0003-hybrid-retrieval-rrf.md) | Hybrid Retrieval (BM25 + kNN) with RRF Fusion | Accepted |
| [ADR-0004](docs/ADR/ADR-0004-claude-eval-judge.md) | Claude as the RAGAS Evaluation Judge | Accepted |
| [ADR-0005](docs/ADR/ADR-0005-embedding-model-selection.md) | Embedding Model Selection — BGE-M3 vs Qwen3-Embedding-0.6B | Under Evaluation |
---
## 10. Incident & Blockage Reporting
## 10. Product Requirements Documents (PRDs)
Product Requirements Documents capture **user-facing features** — what is being built, why it is needed, and how it will be validated. Every feature that modifies the public API, the gRPC contract, or the user experience of any client (VS Code extension, OpenAI-compatible proxy, etc.) requires a PRD before implementation begins.
### When to write a PRD
Write a PRD when a PR introduces or changes:
- A new capability visible to any external consumer (extension, API client, proxy)
- A change to the gRPC contract (`brunix.proto`)
- A change to the HTTP proxy endpoints or behavior
- A feature requested by product or business stakeholders
### When NOT to write a PRD
- Internal architectural changes (use an ADR instead)
- Bug fixes with no change in user-visible behavior
- Infrastructure or tooling changes
### PRD format
PRDs live in `docs/product/` and follow this naming convention:
```
PRD-XXXX-short-title.md
```
Each PRD must contain:
```markdown
# PRD-XXXX: Title
**Date:** YYYY-MM-DD
**Status:** Proposed | Implemented
**Requested by:** Name / role
**Related ADR:** ADR-XXXX (if applicable)
## Problem
What user or business problem does this solve?
## Solution
What are we building?
## Scope
What is in scope and explicitly out of scope?
## Technical design
Key implementation decisions.
## Validation
How do we know this works? Acceptance criteria.
## Impact on parallel workstreams
Does this affect any ongoing experiment or evaluation?
```
### Existing PRDs
| PRD | Title | Status |
|---|---|---|
| [PRD-0001](docs/product/PRD-0001-openai-compatible-proxy.md) | OpenAI-Compatible HTTP Proxy | Implemented |
| [PRD-0002](docs/product/PRD-0002-editor-context-injection.md) | Editor Context Injection for VS Code Extension | Proposed |
---
## 11. Research & Experiments Policy
All scientific experiments, benchmark results, and dataset evaluations conducted by the research team must be documented and committed to the repository under `research/`.
### Rules
- Every experiment must have a corresponding result file in `research/` before any engineering decision based on that experiment is considered valid.
- Benchmark scripts, evaluation notebooks, and raw results must be committed alongside a summary README that explains the methodology, datasets used, metrics, and conclusions.
- Experiments that inform an ADR must be referenced from that ADR with a direct path to the result files.
- The golden dataset used by `EvaluateRAG` (`Docker/src/golden_dataset.json`) is a production artifact. Any modification requires explicit approval from the CTO and a new baseline EvaluateRAG run before the change is merged.
### Directory structure
```
research/
embeddings/ ← embedding model benchmarks (BEIR, MTEB)
experiments/ ← RAG architecture experiments
datasets/ ← synthetic datasets and golden datasets
```
### Why this matters
An engineering decision based on an experiment that is not reproducible, not committed, or not peer-reviewable has no scientific validity. All decisions with impact on the production system must be traceable to documented, committed evidence.
---
## 12. Incident & Blockage Reporting
If you encounter a technical blockage (connection timeouts, service downtime, tunnel failures):

View File

@ -17,6 +17,8 @@ services:
OLLAMA_URL: ${OLLAMA_URL}
OLLAMA_MODEL_NAME: ${OLLAMA_MODEL_NAME}
OLLAMA_EMB_MODEL_NAME: ${OLLAMA_EMB_MODEL_NAME}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
ANTHROPIC_MODEL: ${ANTHROPIC_MODEL}
PROXY_THREAD_WORKERS: 10
extra_hosts:

View File

@ -18,8 +18,28 @@ service AssistanceEngine {
// ---------------------------------------------------------------------------
message AgentRequest {
// Core fields (v1)
string query = 1;
string session_id = 2;
// Editor context fields (v2 PRD-0002)
// All three fields are optional. Clients that do not send them default to
// empty string. Existing clients remain fully compatible without changes.
// Full content of the active file open in the editor at query time.
// Gives the assistant awareness of the complete code the user is working on.
string editor_content = 3;
// Text currently selected in the editor, if any.
// Most precise signal of user intent if non-empty, the question almost
// certainly refers to this specific code block.
string selected_text = 4;
// Free-form additional context (e.g. file path, language identifier,
// open diagnostic errors). Extensible without requiring future proto changes.
string extra_context = 5;
string user_info = 6;
}
message AgentResponse {

View File

@ -197,11 +197,26 @@ def run_evaluation( es_client, llm, embeddings, index_name, category = None, lim
elapsed = time.time() - t_start
# RAGAS >= 0.2 returns an EvaluationResult object, not a dict.
# Extract per-metric means from the underlying DataFrame.
try:
df = result.to_pandas()
def _mean(col):
return round(float(df[col].dropna().mean()), 4) if col in df.columns else 0.0
except Exception:
# Fallback: try legacy dict-style access
df = None
def _mean(col):
try:
return round(float(result[col]), 4)
except Exception:
return 0.0
scores = {
"faithfulness": round(float(result.get("faithfulness", 0)), 4),
"answer_relevancy": round(float(result.get("answer_relevancy", 0)), 4),
"context_recall": round(float(result.get("context_recall", 0)), 4),
"context_precision": round(float(result.get("context_precision", 0)), 4),
"faithfulness": _mean("faithfulness"),
"answer_relevancy": _mean("answer_relevancy"),
"context_recall": _mean("context_recall"),
"context_precision": _mean("context_precision"),
}
valid_scores = [v for v in scores.values() if v > 0]

View File

@ -0,0 +1,302 @@
[
{
"id": "GD-R-001",
"category": "RETRIEVAL",
"question": "What is AVAP and what is it designed for?",
"ground_truth": "AVAP is a Turing-complete Domain-Specific Language (DSL) architecturally designed for the secure, concurrent, and deterministic orchestration of microservices and HTTP I/O. It is not a general-purpose language. Its hybrid engine and strict grammar are optimized for fast HTTP transaction processing, in-memory data manipulation, and interaction with external connectors. AVAP has no internal print commands — all data output is performed through the HTTP interface using addResult()."
},
{
"id": "GD-R-002",
"category": "RETRIEVAL",
"question": "How does the if() conditional block work in AVAP? How are blocks closed?",
"ground_truth": "AVAP uses if() / else() / end() for conditional logic. The if() command evaluates a comparison between two values using a comparator operator (==, !=, >, <, >=, <=, in). Every conditional block must be closed with end(). The else() block is optional and handles the false branch. Example: if(saldo, 0, \">\") executes the true branch when saldo is greater than zero, otherwise the else() block runs, and end() closes the structure. AVAP also supports a mode 2 where a full Python-style expression is passed as a string: if(None, None, \"user_type == 'VIP' or compras > 100\")."
},
{
"id": "GD-R-003",
"category": "RETRIEVAL",
"question": "How does AVAP handle external HTTP calls? What commands are available and how is timeout managed?",
"ground_truth": "AVAP provides RequestGet and RequestPost for external HTTP calls. To avoid blocking threads due to network latency, AVAP requires a mandatory timeout parameter in milliseconds. If the timeout is exceeded, the destination variable receives None. RequestPost(url, querystring, headers, body, destino, timeout) executes an HTTP POST storing the response in destino. RequestGet(url, querystring, headers, destino, timeout) executes an HTTP GET. Both commands allow calling external APIs without additional drivers."
},
{
"id": "GD-R-004",
"category": "RETRIEVAL",
"question": "How do functions work in AVAP? What is the scope of variables inside a function?",
"ground_truth": "Functions in AVAP are hermetic memory enclosures. When entering a function, AVAP creates a new dictionary of local variables isolated from the global context. The return() command acts as a flow switch: it injects the calculated value to the caller and releases local memory. If used inside a startLoop, it also breaks the iteration. Variables declared inside a function are only visible within that function — they are not accessible from the main flow or other functions. AVAP has three scope types: Global Scope, Main Local Scope, and Function Scope."
},
{
"id": "GD-R-005",
"category": "RETRIEVAL",
"question": "What are the three types of variable scopes in AVAP and what are their visibility rules?",
"ground_truth": "AVAP uses three scope types: Global Scope contains globally declared variables, accessible from anywhere in the program and persists for the entire interpreter process lifetime. Main Local Scope contains variables declared in the main flow — accessible within the main flow but not from functions or goroutines, and disappears when script execution ends. Function Scope is created independently for each function invocation and contains function parameters and locally created variables — only visible within that function, not from outside, and is destroyed when the function terminates. If a variable does not exist in the visible scopes, the engine produces a runtime error."
},
{
"id": "GD-R-006",
"category": "RETRIEVAL",
"question": "How does concurrency work in AVAP? What are goroutines and how are they launched?",
"ground_truth": "AVAP implements an advanced system based on lightweight threads (goroutines), allowing the server to process long I/O operations without blocking the main thread. The go command launches a goroutine: identifier = go function_name(parameters). It creates a new isolated execution context and returns a unique identifier. Goroutines follow the same scope rules as normal functions — they can access Global Scope and their own Function Scope, but cannot access the Main Local Scope. The gather command is used to collect results from goroutines."
},
{
"id": "GD-R-007",
"category": "RETRIEVAL",
"question": "What is the addParam command and how does it capture HTTP request parameters?",
"ground_truth": "addParam captures input parameters from HTTP requests (URL query parameters, request body, or form data) and assigns them to a variable. Syntax: addParam(\"paramName\", targetVar). It reads the value of paramName from the incoming HTTP request and stores it in targetVar. If the parameter is not present in the request, the variable receives None. It is the primary mechanism for reading external input in AVAP since the language has no direct access to the request object."
},
{
"id": "GD-R-008",
"category": "RETRIEVAL",
"question": "How does the startLoop / endLoop construct work in AVAP?",
"ground_truth": "startLoop and endLoop define iteration blocks in AVAP. Syntax: startLoop(varName, from, to) where varName is the loop counter, from is the start value, and to is the end value inclusive. The loop counter increments by 1 on each iteration. endLoop() closes the block. Example: startLoop(i, 1, 10) iterates i from 1 to 10. Variables modified inside the loop are accessible after endLoop. To exit a loop early, you can set the counter variable beyond the end value (e.g. i = 11 inside a loop that goes to 10)."
},
{
"id": "GD-R-009",
"category": "RETRIEVAL",
"question": "What is the addResult command and how does it build the HTTP response?",
"ground_truth": "addResult adds a variable to the HTTP JSON response body. Syntax: addResult(varName). Each call to addResult adds one key-value pair to the response object where the key is the variable name and the value is its current value. AVAP has no internal print commands — addResult is the only way to expose data to the caller. Multiple addResult calls build up a JSON object with multiple fields. The HTTP status code is set separately via the _status variable."
},
{
"id": "GD-R-010",
"category": "RETRIEVAL",
"question": "How does error handling work in AVAP with try() and exception()?",
"ground_truth": "AVAP uses try() / exception() / end() for error handling. The try() block wraps code that may fail. If an exception occurs inside the try block, execution jumps to the exception() block instead of halting. exception(errorVar) captures the error message into errorVar. The end() command closes the structure. Without a try block, any unhandled exception stops script execution and returns a 400 error. With a try block, you can handle the error gracefully — for example by setting _status to 500 and returning a structured error message."
},
{
"id": "GD-R-011",
"category": "RETRIEVAL",
"question": "What is the replace() command in AVAP and how is it used?",
"ground_truth": "The replace() command performs string substitution in AVAP. Syntax: replace(sourceString, searchValue, replaceValue, targetVar). It replaces all occurrences of searchValue in sourceString with replaceValue and stores the result in targetVar. Example: replace(\"REF_1234_OLD\", \"OLD\", \"NEW\", ref_actualizada) stores \"REF_1234_NEW\" in ref_actualizada. The source can be a literal string or a variable name. The command does not modify the original variable — it always writes to targetVar."
},
{
"id": "GD-R-012",
"category": "RETRIEVAL",
"question": "What are the reserved keywords in AVAP that cannot be used as identifiers?",
"ground_truth": "AVAP has the following reserved keywords that cannot be used as variable or function names: Control flow — if, else, end, startLoop, endLoop, try, exception, return. Function declaration — function. Concurrency — go, gather. Modularity — include, import. Logical operators — and, or, not, in, is. Literals — True, False, None. Using any of these as an identifier will cause a lexer or parser error."
},
{
"id": "GD-R-013",
"category": "RETRIEVAL",
"question": "How does AVAP handle string formatting and concatenation?",
"ground_truth": "AVAP supports two main string operations. Concatenation uses the + operator: result = \"Hello, \" + name produces a concatenated string. String formatting uses Python-style % operator: log = \"Evento registrado por: %s\" % nombre substitutes the variable value into the format string. Strings support single and double quotes. Escape sequences supported include \\n (newline), \\t (tab), \\r (carriage return), \\\" (double quote), \\' (single quote), and \\\\ (backslash). Note that \\n inside a string is a data character, not a statement terminator — the physical EOL is the only statement terminator in AVAP."
},
{
"id": "GD-R-014",
"category": "RETRIEVAL",
"question": "How does the encodeSHA256 command work in AVAP?",
"ground_truth": "encodeSHA256 computes the SHA-256 hash of an input value and stores the result in a destination variable. Syntax: encodeSHA256(inputValue, destVar). The result is a 64-character lowercase hexadecimal string representing the SHA-256 digest. Example: encodeSHA256(\"payload_data\", checksum) stores the hash of the string \"payload_data\" into the variable checksum. The input can be a string literal or a variable. It is commonly used for integrity verification, password hashing, and generating checksums."
},
{
"id": "GD-R-015",
"category": "RETRIEVAL",
"question": "How does AVAP handle date and time operations?",
"ground_truth": "AVAP provides two date/time commands. getDateTime(format, offsetSeconds, timezone, destVar) gets the current date/time, optionally applying an offset in seconds and converting to the specified timezone. Example: getDateTime(\"%Y-%m-%d %H:%M:%S\", 0, \"Europe/Madrid\", sql_date) stores the current Madrid time formatted for SQL. getDateTime(\"\", 86400, \"UTC\", expira) gets the current UTC time plus 86400 seconds (1 day ahead), useful for expiration timestamps. stampToDatetime(unixTimestamp, format, offset, destVar) converts a Unix timestamp to a human-readable string. Example: stampToDatetime(1708726162, \"%d/%m/%Y\", 0, fecha_human)."
},
{
"id": "GD-R-016",
"category": "RETRIEVAL",
"question": "What is the AddvariableToJSON command and how is it used to build JSON objects?",
"ground_truth": "AddvariableToJSON inserts a key-value pair into an existing JSON object variable. Syntax: AddvariableToJSON(key, value, jsonVar). The key can be a string literal or a variable. The value can be a string, number, or variable. The jsonVar must be an already-declared variable typically initialized as \"{}\" via addVar. Example: addVar(mi_json, \"{}\") then AddvariableToJSON(\"status\", \"ok\", mi_json) adds the key \"status\" with value \"ok\" to mi_json. It is commonly used inside loops to build dynamic JSON objects iteratively."
},
{
"id": "GD-R-017",
"category": "RETRIEVAL",
"question": "How does the getListLen command work and what is it used for?",
"ground_truth": "getListLen retrieves the length of a list variable and stores it in a destination variable. Syntax: getListLen(listVar, destVar). Example: getListLen(registros, total) stores the number of elements in registros into total. It is commonly used before a startLoop to set the upper bound of iteration, enabling dynamic loops that adapt to the actual size of the data. Example pattern: getListLen(mi_lista, cantidad) followed by startLoop(i, 0, cantidad) to iterate over all elements."
},
{
"id": "GD-R-018",
"category": "RETRIEVAL",
"question": "How does the randomString command work in AVAP?",
"ground_truth": "randomString generates a random string of a specified length using a character pattern. Syntax: randomString(pattern, length, destVar). The pattern is a regex-style character class defining which characters to use. Example: randomString(\"[A-Z]\\d\", 32, token_seguridad) generates a 32-character random string using uppercase letters and digits. Another example: randomString(\"[a-zA-Z0-9]\", 16, token) generates a 16-character alphanumeric token. It is commonly used for generating secure tokens, session identifiers, and temporary passwords."
},
{
"id": "GD-R-019",
"category": "RETRIEVAL",
"question": "What is the $ dereference operator in AVAP and when is it used?",
"ground_truth": "The $ operator in AVAP is the dereference operator, used to access the value of a variable by reference at assignment time. Syntax: addVar(copia, $original) copies the current value of original into copia. The token is defined as DEREF in the lexer. It is used when you need to capture the current value of a variable into another variable, particularly useful when a variable may change later and you need to preserve its value at a specific point in execution."
},
{
"id": "GD-R-020",
"category": "RETRIEVAL",
"question": "How does AVAP handle ORM database operations? What commands are available?",
"ground_truth": "AVAP provides native ORM commands for database operations without requiring additional drivers. ormCheckTable(tableName, resultVar) checks if a table exists storing True or False in resultVar. ormCreateTable(columns, types, tableName, resultVar) creates a new table with the specified column names and types. ormDirect(query, resultVar) executes a raw SQL query directly. ormAccessSelect executes SELECT queries and ormAccessInsert executes INSERT operations. avapConnector is used to initialize the database connection. The connector and ORM commands are distinguished only by context — the UUID passed as argument determines whether the adapter resolves as a database ORM or a third-party service proxy."
},
{
"id": "GD-C-001",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that reads a 'name' parameter and returns a personalized greeting.",
"ground_truth": "The following AVAP script reads a name parameter and returns a personalized greeting:\n\naddParam(\"name\", name)\nresult = \"Hello, \" + name\naddResult(result)\n\nKey commands: addParam reads the HTTP parameter 'name' into variable name. The + operator concatenates the greeting string with the name. addResult exposes result in the JSON response."
},
{
"id": "GD-C-002",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that reads a 'password' parameter, generates a SHA-256 hash, and returns it.",
"ground_truth": "The following AVAP script hashes a password parameter using SHA-256:\n\naddParam(\"password\", password)\nencodeSHA256(password, hashed_password)\naddResult(hashed_password)\n\nKey commands: addParam reads the 'password' HTTP parameter. encodeSHA256 computes the SHA-256 hash and stores the 64-character hex digest in hashed_password. addResult exposes the hash in the JSON response."
},
{
"id": "GD-C-003",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that loops from 1 to 5, builds a JSON object with each index as a key, and returns it.",
"ground_truth": "The following AVAP script builds a JSON object iteratively:\n\naddVar(mi_json, \"{}\")\nstartLoop(i, 1, 5)\n item = \"item_%s\" % i\n AddvariableToJSON(item, \"valor_generado\", mi_json)\nendLoop()\naddResult(mi_json)\n\nKey commands: addVar initializes an empty JSON object. startLoop iterates i from 1 to 5 inclusive. The % operator formats the key name dynamically. AddvariableToJSON inserts each key-value pair into mi_json. addResult exposes the final object."
},
{
"id": "GD-C-004",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that validates if a 'role' parameter belongs to a list of allowed roles and returns the access result.",
"ground_truth": "The following AVAP script validates role membership:\n\naddParam(\"rol\", r)\nif(r, [\"admin\", \"editor\", \"root\"], \"in\")\n acceso = True\nelse()\n acceso = False\nend()\naddResult(acceso)\n\nKey commands: addParam reads the 'rol' parameter. The if() with \"in\" comparator checks list membership directly against a list literal. else() handles the false branch. end() closes the conditional block. addResult exposes the boolean result."
},
{
"id": "GD-C-005",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that makes a GET request to an external API and handles connection errors.",
"ground_truth": "The following AVAP script performs a GET request with error handling:\n\ntry()\n RequestGet(\"https://api.test.com/data\", 0, 0, respuesta)\nexception(e)\n addVar(error_trace, \"Fallo de conexion: %s\" % e)\n addResult(error_trace)\nend()\naddResult(respuesta)\n\nKey commands: try() wraps the potentially failing operation. RequestGet fetches the URL storing the response in respuesta. exception(e) captures any error message. The % operator formats the error string. addResult exposes either the response or the error."
},
{
"id": "GD-C-006",
"category": "CODE_GENERATION",
"question": "Write an AVAP function that takes two numbers and returns their sum, then call it and return the result.",
"ground_truth": "The following AVAP script defines and calls a sum function:\n\nfunction suma(a, b){\n total = a + b\n return(total)\n}\nresultado = suma(10, 20)\naddResult(resultado)\n\nKey commands: function declares a named function with parameters a and b. The + operator adds the values. return() sends the result back to the caller and releases the function scope. The function is called with literal values 10 and 20. addResult exposes the result."
},
{
"id": "GD-C-007",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that reads a 'subtotal' parameter, computes 21% VAT, and returns the total.",
"ground_truth": "The following AVAP script calculates the total with VAT:\n\naddParam(\"subtotal\", subtotal)\niva = subtotal * 0.21\ntotal = subtotal + iva\naddResult(total)\n\nKey commands: addParam reads the subtotal from the HTTP request. The * operator multiplies by the tax rate 0.21. The + operator adds subtotal and iva. addResult exposes the final total. AVAP supports float arithmetic natively."
},
{
"id": "GD-C-008",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that reads an 'api_key' parameter and returns status 403 if it is null.",
"ground_truth": "The following AVAP script validates that an API key is present:\n\naddParam(\"api_key\", key)\nif(key, None, \"==\")\n addVar(_status, 403)\n addVar(error, \"Acceso denegado: falta API KEY\")\n addResult(error)\nend()\n\nKey commands: addParam reads the api_key parameter — if not present it will be None. The if() with \"==\" and None checks for null. addVar sets _status to 403 which becomes the HTTP response code. addResult exposes the error message."
},
{
"id": "GD-C-009",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that generates a 32-character random alphanumeric token and returns it.",
"ground_truth": "The following AVAP script generates a secure random token:\n\nrandomString(\"[a-zA-Z0-9]\", 32, token_seguridad)\naddResult(token_seguridad)\n\nKey commands: randomString generates a random string using the character class [a-zA-Z0-9] at length 32 and stores it in token_seguridad. addResult exposes the token in the HTTP response."
},
{
"id": "GD-C-010",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that reads a 'lang' parameter and returns 'Hola' if it is 'es' or 'Hello' if it is 'en'.",
"ground_truth": "The following AVAP script returns a greeting based on language:\n\naddParam(\"lang\", l)\nif(l, \"es\", \"=\")\n addVar(msg, \"Hola\")\nelse()\n addVar(msg, \"Hello\")\nend()\naddResult(msg)\n\nKey commands: addParam reads the lang parameter into l. The if() with \"=\" comparator checks string equality. else() handles all other cases. addVar sets the message. addResult exposes the localized greeting."
},
{
"id": "GD-C-011",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that checks if a database table exists and creates it if it does not.",
"ground_truth": "The following AVAP script checks and creates a database table:\n\normCheckTable(tabla_pruebas, resultado_comprobacion)\nif(resultado_comprobacion, False, \"==\")\n ormCreateTable(\"username,age\", \"VARCHAR,INTEGER\", tabla_pruebas, resultado_creacion)\nend()\naddResult(resultado_comprobacion)\naddResult(resultado_creacion)\n\nKey commands: ormCheckTable checks if the table exists storing True or False. The if() block only executes if the check returned False. ormCreateTable creates the table with the specified columns and types. Both results are exposed via addResult."
},
{
"id": "GD-C-012",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that gets the current UTC timestamp and adds 24 hours to compute an expiration time.",
"ground_truth": "The following AVAP script computes an expiration timestamp 24 hours from now:\n\ngetDateTime(\"\", 86400, \"UTC\", expira)\naddResult(expira)\n\nKey commands: getDateTime with an empty format string returns a raw timestamp. The second parameter 86400 is the offset in seconds (60 * 60 * 24 = 86400 = 1 day). The timezone is set to UTC. The result is stored in expira and exposed via addResult."
},
{
"id": "GD-C-013",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that receives a new password parameter, validates it is not equal to the old password, and returns a confirmation.",
"ground_truth": "The following AVAP script validates a password change:\n\naddParam(\"password\", pass_nueva)\npass_antigua = \"password\"\nif(pass_nueva, pass_antigua, \"!=\")\n addVar(cambio, \"Contrasena actualizada\")\nend()\naddResult(cambio)\n\nKey commands: addParam reads the new password. The old password is assigned as a literal. The if() with \"!=\" comparator checks inequality. addVar sets the confirmation message only if passwords differ. addResult exposes the message."
},
{
"id": "GD-C-014",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that reads a list parameter and returns its element count.",
"ground_truth": "The following AVAP script reads a list parameter and returns its length:\n\naddParam(\"data_list\", mi_lista)\ngetListLen(mi_lista, cantidad)\naddResult(cantidad)\n\nKey commands: addParam reads the list from the HTTP request into mi_lista. getListLen computes the number of elements and stores it in cantidad. addResult exposes the count in the JSON response."
},
{
"id": "GD-C-015",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that uses a validation function to check a token parameter and returns the authorization result.",
"ground_truth": "The following AVAP script uses a function to validate a token:\n\nfunction es_valido(token){\n response = False\n if(token, \"SECRET\", \"=\")\n response = True\n end()\n return(response)\n}\naddParam(\"token\", t)\nautorizado = es_valido(t)\naddResult(autorizado)\n\nKey commands: function defines es_valido with a token parameter. response is initialized to False. The if() with \"=\" checks against the expected secret. return() sends the boolean back to the caller. addResult exposes the authorization result."
},
{
"id": "GD-C-016",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that returns two values in the HTTP response: a status code 200 and a message 'Success'.",
"ground_truth": "The following AVAP script returns multiple values in the HTTP response:\n\naddVar(_status, 200)\naddVar(status, \"Success\")\naddResult(status)\n\nOr returning both as JSON fields:\n\naddVar(code, 200)\naddVar(status, \"Success\")\naddResult(code)\naddResult(status)\n\nKey commands: _status is the special variable that sets the HTTP response status code. Multiple addResult calls build a JSON object with multiple fields in the response body."
},
{
"id": "GD-C-017",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that reads a 'saldo' parameter and returns True if it is greater than zero, False otherwise.",
"ground_truth": "The following AVAP script checks if a balance is positive:\n\naddParam(\"saldo\", saldo)\nif(saldo, 0, \">\")\n permitir = True\nelse()\n permitir = False\nend()\naddResult(permitir)\n\nKey commands: addParam reads the saldo parameter. The if() with \">\" comparator checks if saldo is greater than 0. else() handles the zero or negative case. end() closes the block. addResult exposes the boolean result."
},
{
"id": "GD-C-018",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that converts a Unix timestamp parameter to a human-readable date in dd/mm/yyyy format.",
"ground_truth": "The following AVAP script converts a Unix timestamp to a readable date:\n\naddParam(\"timestamp\", ts)\nstampToDatetime(ts, \"%d/%m/%Y\", 0, fecha_human)\naddResult(fecha_human)\n\nKey commands: addParam reads the timestamp from the HTTP request. stampToDatetime converts the Unix epoch integer to a formatted date string using \"%d/%m/%Y\" which produces day/month/year. The third parameter is a timezone offset in seconds. The result is stored in fecha_human and returned via addResult."
},
{
"id": "GD-C-019",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that replaces all spaces in a string parameter with hyphens and returns the result.",
"ground_truth": "The following AVAP script replaces spaces with hyphens:\n\naddParam(\"text\", input_text)\nreplace(input_text, \" \", \"-\", clean_text)\naddResult(clean_text)\n\nKey commands: addParam reads the text parameter. replace() substitutes all occurrences of space with hyphen in input_text and stores the result in clean_text. The original variable is not modified. addResult exposes the transformed string."
},
{
"id": "GD-C-020",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that uses try/exception to execute a raw SQL query and return status 500 on database errors.",
"ground_truth": "The following AVAP script executes SQL with error handling:\n\ntry()\n ormDirect(\"UPDATE tabla SET col=1 WHERE id=1\", res)\nexception(e)\n addVar(_status, 500)\n addResult(\"Error de base de datos\")\nend()\naddResult(res)\n\nKey commands: try() wraps the database operation. ormDirect executes raw SQL storing the result in res. exception(e) catches any database error. addVar sets _status to 500 to signal a server error. The final addResult exposes the query result on success."
},
{
"id": "GD-V-001",
"category": "CONVERSATIONAL",
"question": "Can you summarize what you just explained about AVAP scopes in fewer words?",
"ground_truth": "AVAP has three scopes: Global (visible everywhere, lives for the whole process), Main Local (visible only in the main script flow, not inside functions), and Function (created per function call, destroyed when the function returns). Functions cannot see main flow variables, and the main flow cannot see function-internal variables."
},
{
"id": "GD-V-002",
"category": "CONVERSATIONAL",
"question": "You mentioned that addResult builds the JSON response — can you clarify how multiple addResult calls work together?",
"ground_truth": "Each addResult call adds one field to the JSON response object. The field name is the variable name passed to addResult and the field value is the current value of that variable. So calling addResult(code) and addResult(status) produces a JSON response like {\"code\": 200, \"status\": \"Success\"}. The fields are added in the order the addResult calls are executed during script execution."
},
{
"id": "GD-V-003",
"category": "CONVERSATIONAL",
"question": "What is the difference between addVar and a plain assignment like x = 10 in AVAP?",
"ground_truth": "Both addVar and direct assignment declare variables. addVar(varName, value) is the explicit command form — it supports intelligent value resolution checking if the value is an existing variable, a number, or a literal. Direct assignment x = 10 is syntactic sugar that works identically for simple cases. addVar is preferred for declaring new variables with explicit intent, while direct assignment is more natural for updating values or computed expressions."
},
{
"id": "GD-V-004",
"category": "CONVERSATIONAL",
"question": "Can you explain again the difference between the two modes of the if() command?",
"ground_truth": "Mode 1 is structured comparison: if(variable, value, comparator) — for example if(saldo, 0, \">\") directly compares the variable saldo against 0 using the > operator. Mode 2 is expression mode: if(None, None, \"expression\") — for example if(None, None, \"user_type == 'VIP' or compras > 100\") evaluates a full Python-style boolean expression passed as a string. Mode 2 is more flexible but requires passing None as the first two arguments."
},
{
"id": "GD-V-005",
"category": "CONVERSATIONAL",
"question": "What happens if an error occurs in AVAP without a try block?",
"ground_truth": "Without a try block, any unhandled exception stops script execution immediately and the server returns a 400 Bad Request error with the error message in the response body. The remaining commands in the script are not executed. With a try block, the error is caught by exception(), the script continues running, and you can handle the error gracefully — for example by setting _status to 500 and returning a structured error message."
},
{
"id": "GD-V-006",
"category": "CONVERSATIONAL",
"question": "Can you explain again how the timeout in RequestGet works?",
"ground_truth": "The timeout parameter in RequestGet and RequestPost is specified in milliseconds. If the external server does not respond within that time, the request is aborted and the destination variable receives None instead of a response. This prevents the AVAP thread from blocking indefinitely on a slow or unavailable external service. You should always check if the result variable is None after a request to handle timeout cases gracefully."
},
{
"id": "GD-V-007",
"category": "CONVERSATIONAL",
"question": "Can I iterate over a list of items in AVAP instead of a numeric range?",
"ground_truth": "Yes, but AVAP loops are always numeric — startLoop uses a start and end integer. To iterate over a list, combine getListLen to get the total count, use that count as the loop boundary, and inside the loop use the index variable to access each element. Example: getListLen(mi_lista, total) then startLoop(i, 0, total) with list access inside. Lists are zero-indexed so the index starts at 0."
},
{
"id": "GD-V-008",
"category": "CONVERSATIONAL",
"question": "What is the difference between RequestGet and RequestPost in practice?",
"ground_truth": "RequestGet sends an HTTP GET request — used for retrieving data, with parameters passed as query string. RequestPost sends an HTTP POST request — used for submitting data, with a body payload that can be JSON or form data. Both require a timeout parameter in milliseconds and store the response in a destination variable. Both return None in the destination variable if the request times out. The key structural difference is that RequestPost includes a body parameter while RequestGet does not."
},
{
"id": "GD-V-009",
"category": "CONVERSATIONAL",
"question": "Goroutines cannot access Main Local Scope — can you give a practical example of why that matters?",
"ground_truth": "If you declare a variable in the main flow and launch a goroutine, the goroutine cannot read that variable. For example if you do addVar(counter, 0) in the main flow and then call go myFunction(), the function myFunction cannot access counter — it would get a runtime error. To share data with goroutines you must either pass the value as a function parameter, or declare the variable in Global Scope. This isolation prevents race conditions between concurrent goroutines and the main flow."
},
{
"id": "GD-V-010",
"category": "CONVERSATIONAL",
"question": "What format does encodeSHA256 return its output in?",
"ground_truth": "encodeSHA256 always returns a 64-character lowercase hexadecimal string. This is the standard SHA-256 digest representation — 256 bits expressed as 64 hex characters (0-9 and a-f). The output is deterministic — the same input always produces the same hash — which is why SHA-256 is used for integrity verification rather than for generating unique identifiers."
}
]

View File

@ -0,0 +1,832 @@
[
{
"id": "avap_q1",
"question": "What type of language is AVAP and what is it primarily designed for?",
"ground_truth": "AVAP is a Turing Complete Domain-Specific Language (DSL) designed for the secure, concurrent, and deterministic orchestration of microservices and I/O. It is not a general-purpose language; its hybrid engine and strict grammar are optimized for fast HTTP transaction processing, in-memory data manipulation, and persistence."
},
{
"id": "avap_q2",
"question": "¿Por qué AVAP es considerado un lenguaje orientado a líneas?",
"ground_truth": "AVAP es estrictamente orientado a líneas porque cada instrucción lógica debe completarse en una única línea física de texto. El motor reconoce el salto de línea o retorno de carro como el terminador absoluto de la instrucción, y no se admite la partición de una instrucción en múltiples líneas."
},
{
"id": "avap_q3",
"question": "What happens when an HTTP request uses a method not specified in registerEndpoint?",
"ground_truth": "The AVAP server will automatically reject the request with an HTTP Error 405 if the request method does not match the method specified in the registerEndpoint command."
},
{
"id": "avap_q4",
"question": "¿Qué hace el operador de desreferenciación $ en AVAP?",
"ground_truth": "El prefijo $ indica al motor que debe buscar en la tabla de símbolos la variable cuyo nombre sigue al símbolo y extraer su valor. Por ejemplo, en addVar(copia, $original), el motor busca la variable llamada 'original' y extrae su valor para asignarlo a 'copia'."
},
{
"id": "avap_q5",
"question": "What are the semantic restrictions of the addVar command in AVAP?",
"ground_truth": "The addVar command accepts either addVar(value, variable) or addVar(variable, value). If both arguments are identifiers, the value of the second is assigned to the first. It is not permitted to use two literals as arguments; at least one argument must be an identifier."
},
{
"id": "avap_q6",
"question": "¿En qué orden busca addParam los parámetros de una petición HTTP?",
"ground_truth": "El comando addParam inspecciona la petición HTTP en un orden jerárquico estricto: primero en la URL (Query arguments), luego en el JSON Body, y finalmente en el Form Data. Si el parámetro no existe en ninguno de estos lugares, la variable de destino se inicializa como None."
},
{
"id": "avap_q7",
"question": "What does getQueryParamList do when a URL parameter appears multiple times?",
"ground_truth": "The getQueryParamList command automatically packages multiple occurrences of the same URL parameter (for example, ?filter=A&filter=B) into a single list structure, making it easy to handle repeated query parameters."
},
{
"id": "avap_q8",
"question": "¿Cómo se define el código de estado HTTP de respuesta en AVAP?",
"ground_truth": "La variable de sistema _status permite definir explícitamente el código HTTP de salida. Puede asignarse mediante asignación directa (por ejemplo, _status = 404) o mediante el comando addVar (por ejemplo, addVar(_status, 401)). Es accesible y asignable desde cualquier scope."
},
{
"id": "avap_q9",
"question": "What is the purpose of the addResult command in AVAP?",
"ground_truth": "The addResult command registers which variables will form part of the final JSON response body. It is the mechanism through which AVAP sends data back to the HTTP client, since AVAP has no internal print commands."
},
{
"id": "avap_q10",
"question": "¿Cuáles son los dos modos de invocación del comando if() en AVAP?",
"ground_truth": "El comando if() tiene dos modos: el Modo 1 (comparación estructurada) con sintaxis if(átomo_1, átomo_2, 'operador'), usado para comparaciones directas entre dos valores simples; y el Modo 2 (expresión libre) con sintaxis if(None, None, `expresión_compleja`), usado para evaluar expresiones lógicas complejas encapsuladas entre acentos graves."
},
{
"id": "avap_q11",
"question": "In AVAP's structured if mode, what types of values are allowed as the first two arguments?",
"ground_truth": "In Mode 1 (structured comparison), the first two arguments must be simple identifiers (variables) or literals (strings or numbers). The use of None is not permitted in this mode, and expressions involving property access such as data.user or list[0] are also not allowed."
},
{
"id": "avap_q12",
"question": "¿Qué delimitador debe usarse para el tercer argumento en el Modo 2 del comando if()?",
"ground_truth": "En el Modo 2 (expresión libre), el tercer argumento debe estar encapsulado entre acentos graves (backticks). No se permite usar comillas dobles o simples para este argumento. Los primeros dos argumentos deben ser literalmente la palabra None sin comillas."
},
{
"id": "avap_q13",
"question": "What is wrong with the expression if(username, None, '==') in AVAP?",
"ground_truth": "This expression is invalid because Mode 1 of the if command prohibits the use of None as an argument. If None needs to be used, Mode 2 must be used instead, with the syntax if(None, None, `expression`)."
},
{
"id": "avap_q14",
"question": "¿Cómo se cierra un bloque condicional if en AVAP?",
"ground_truth": "Todo bloque condicional if en AVAP requiere un cierre explícito utilizando el comando end(). El bloque opcional else se introduce con else() y también queda delimitado por el end() final."
},
{
"id": "avap_q15",
"question": "How do you exit a startLoop early in AVAP?",
"ground_truth": "The way to exit a startLoop early in AVAP is by invoking the global return() command. The loop itself only iterates based on finite numeric indices and must be closed with endLoop()."
},
{
"id": "avap_q16",
"question": "¿Qué ocurre cuando se produce un fallo dentro de un bloque try() en AVAP?",
"ground_truth": "Si ocurre un fallo del sistema dentro del bloque try, el flujo de ejecución salta al bloque exception(variable_error), donde la variable especificada se puebla con la traza del error para facilitar la recuperación del script."
},
{
"id": "avap_q17",
"question": "What is the syntax for launching an asynchronous goroutine in AVAP?",
"ground_truth": "The syntax for launching a goroutine is: identifier = go function_name(parameters). This creates a new isolated execution context and returns a unique identifier that must be saved to interact with the thread later."
},
{
"id": "avap_q18",
"question": "¿Qué devuelve gather() si se supera el timeout especificado?",
"ground_truth": "Si se supera el timeout especificado en gather(identificador, timeout), el comando cancela la espera y devuelve None como resultado."
},
{
"id": "avap_q19",
"question": "What is avapConnector and how is it used in AVAP?",
"ground_truth": "avapConnector is the mechanism for integrating with third-party services configured on the AVAP platform. A connector is registered in advance with a unique UUID. When instantiated, the variable becomes a proxy object that encapsulates credentials and context, exposing dynamic methods via dot notation."
},
{
"id": "avap_q20",
"question": "¿Qué parámetro es obligatorio en RequestPost y RequestGet para evitar hilos bloqueados?",
"ground_truth": "Tanto RequestPost como RequestGet exigen un parámetro de timeout expresado en milisegundos. Si se supera este tiempo, la variable destino recibe None. Este parámetro es obligatorio para evitar que los hilos queden bloqueados por latencia de red."
},
{
"id": "avap_q21",
"question": "What is the difference between RequestPost and RequestGet in AVAP?",
"ground_truth": "RequestPost executes an HTTP POST request and includes a body parameter, with the signature RequestPost(url, querystring, headers, body, destination, timeout). RequestGet executes an HTTP GET request and omits the body, with the signature RequestGet(url, querystring, headers, destination, timeout)."
},
{
"id": "avap_q22",
"question": "¿Qué hace ormAccessSelect cuando el selector está vacío?",
"ground_truth": "El comando ormAccessSelect recupera registros de una tabla. El selector es la cláusula WHERE y puede estar vacío, lo que implica que se recuperarán todos los registros de la tabla. El resultado se devuelve como una lista de diccionarios."
},
{
"id": "avap_q23",
"question": "When is ormDirect used instead of other ORM commands in AVAP?",
"ground_truth": "ormDirect is used for executing raw SQL statements, making it suitable for complex analytical queries that cannot be expressed through the structured ORM commands like ormAccessSelect or ormAccessUpdate."
},
{
"id": "avap_q24",
"question": "¿Es obligatorio el selector en ormAccessUpdate? ¿Por qué?",
"ground_truth": "Sí, el selector es obligatorio en ormAccessUpdate. Su propósito es delimitar el alcance del cambio en la tabla, es decir, especificar qué registros deben ser modificados. Sin él, la operación podría afectar a todos los registros de la tabla."
},
{
"id": "avap_q25",
"question": "How does AVAP distinguish between a third-party connector and a database ORM connector at runtime?",
"ground_truth": "The grammar treats both connector types identically using avapConnector('TOKEN'). The distinction is made at runtime by the execution engine, which selects the appropriate adapter based on the UUID passed as argument, determining whether it resolves to a database ORM or a third-party proxy."
},
{
"id": "avap_q26",
"question": "¿Cómo se construye una lista en AVAP si no se pueden usar literales de array?",
"ground_truth": "En AVAP, las listas no se instancian con literales de array. Se construyen y recorren a través de un conjunto cerrado de comandos especializados como variableToList (para crear una lista desde un valor escalar), itemFromList (para acceder a elementos por índice) y getListLen (para obtener la longitud)."
},
{
"id": "avap_q27",
"question": "What does variableToList do in AVAP?",
"ground_truth": "variableToList forces a scalar variable to be converted into an iterable list structure containing a single element. It is the canonical entry point for building a list from scratch starting from an existing value."
},
{
"id": "avap_q28",
"question": "¿Por qué se recomienda llamar a getListLen antes de itemFromList?",
"ground_truth": "Se recomienda llamar siempre a getListLen antes de itemFromList para evitar accesos fuera de rango. getListLen calcula el número total de elementos en la lista, lo que permite construir bucles de recorrido seguro y validar que el índice al que se quiere acceder existe."
},
{
"id": "avap_q29",
"question": "What happens when AddVariableToJSON is called with a key that already exists in the JSON object?",
"ground_truth": "When AddVariableToJSON is called with a key that already exists in the JSON object, the existing value for that key is overwritten with the new value provided."
},
{
"id": "avap_q30",
"question": "¿Cuál es la diferencia entre encodeSHA256 y encodeMD5 en AVAP?",
"ground_truth": "Ambas son funciones criptográficas que encriptan texto de forma irreversible. SHA-256 produce un digest de 64 caracteres hexadecimales y ofrece mayor resistencia criptográfica, mientras que MD5 produce un digest de 32 caracteres. Se recomienda SHA-256 para nuevos desarrollos."
},
{
"id": "avap_q31",
"question": "What regex syntax does getRegex use in AVAP?",
"ground_truth": "The getRegex command uses standard regex syntax compatible with Python's re module. It applies the pattern to the source variable and extracts the first exact match found."
},
{
"id": "avap_q32",
"question": "¿Qué hace getDateTime en AVAP y qué parámetros acepta?",
"ground_truth": "getDateTime captura la fecha y hora actuales del sistema, aplica el ajuste timedelta especificado y las convierte a la zona horaria indicada antes de almacenar el resultado en la variable destino. Acepta cualquier zona horaria reconocida por la librería pytz de Python. Sus parámetros son: formato, timedelta, zona_horaria y destino."
},
{
"id": "avap_q33",
"question": "What is the difference between getTimeStamp and stampToDatetime in AVAP?",
"ground_truth": "getTimeStamp converts a readable date string to its Unix Epoch integer value, while stampToDatetime does the reverse, converting an Epoch integer value to a formatted date string. Both support strftime format notation and timedelta adjustments in seconds."
},
{
"id": "avap_q34",
"question": "¿Cómo se expresa un timedelta negativo en los comandos de fecha de AVAP?",
"ground_truth": "Los comandos de fecha de AVAP expresan el timedelta en segundos. Un valor positivo suma tiempo y un valor negativo resta tiempo. Por ejemplo, para restar una hora se usaría -3600 como valor de timedelta."
},
{
"id": "avap_q35",
"question": "What does randomString do in AVAP and what parameters does it take?",
"ground_truth": "randomString generates a random string of a specified length whose characters are restricted to the set defined by a pattern (a character regular expression). It takes three parameters: pattern, length, and destination. It is useful for generating session tokens, temporary passwords, or unique identifiers."
},
{
"id": "avap_q36",
"question": "¿Qué hace el comando replace en AVAP?",
"ground_truth": "El comando replace localiza todas las ocurrencias de un patrón de búsqueda dentro de la variable de origen y las sustituye por el valor de reemplazo, almacenando el resultado en la variable destino. Sus parámetros son: origen, patron_busqueda, reemplazo y destino."
},
{
"id": "avap_q37",
"question": "How are function blocks delimited in AVAP, and how does this differ from control structures?",
"ground_truth": "Functions use curly braces {} as block delimiters, which is an explicit architectural decision. This differs from control structures like if, loop, and try, which use keyword closers such as end(), endLoop(), and end() respectively. The parser distinguishes between them by the opening token."
},
{
"id": "avap_q38",
"question": "¿Qué hace return() cuando se usa dentro de un startLoop en AVAP?",
"ground_truth": "Cuando return() se usa dentro de un startLoop, actúa como interruptor de flujo que rompe la iteración anticipadamente, además de inyectar el valor calculado al llamador y liberar la memoria local de la función."
},
{
"id": "avap_q39",
"question": "What is the difference between include and import in AVAP?",
"ground_truth": "include is a preprocessor directive that pastes the content of a physical file at the current line (static inclusion). import loads collections of functions: angle brackets (import <math>) are used for native libraries, while quotes (import 'my_utils') are used for local libraries."
},
{
"id": "avap_q40",
"question": "¿Cómo se importa una librería nativa en AVAP?",
"ground_truth": "Para importar una librería nativa en AVAP se usan corchetes angulares, con la sintaxis import <nombre_libreria>. Para librerías locales se usan comillas, con la sintaxis import 'nombre_libreria'."
},
{
"id": "avap_q41",
"question": "What type casting functions are available in AVAP expressions?",
"ground_truth": "AVAP supports explicit type casting using standard constructor functions: int(var) to convert to integer, float(var) to convert to float, and str(var) to convert to string. These can be used in any evaluation."
},
{
"id": "avap_q42",
"question": "¿Cómo funciona el slicing en AVAP?",
"ground_truth": "En AVAP se pueden extraer fragmentos de listas o strings usando la notación de dos puntos. Por ejemplo, mi_lista[1:4] extrae los elementos desde el índice 1 hasta el índice 3 (el índice final es exclusivo). Esta notación también soporta un tercer parámetro opcional para el paso."
},
{
"id": "avap_q43",
"question": "What are list comprehensions in AVAP and how are they written?",
"ground_truth": "AVAP supports list comprehensions for quickly building lists using iterators in a single line, allowing filtering and mapping of entire collections. The syntax is [expression for identifier in expression] with an optional if clause, for example [x * 2 for x in values if x > 0]."
},
{
"id": "avap_q44",
"question": "¿Cuáles son los tres tipos de comentarios en AVAP y cómo se diferencian?",
"ground_truth": "AVAP tiene tres tipos de comentarios: comentarios de línea (//) que ignoran el texto hasta el salto de línea; comentarios de bloque (/* ... */) para aislar bloques multilínea; y comentarios de documentación (///) utilizados por analizadores de código o IDEs para generar documentación técnica automática (Docstrings)."
},
{
"id": "avap_q45",
"question": "Why must the lexer evaluate /// before // in AVAP?",
"ground_truth": "The lexer must evaluate /// before // because it applies the longest-match principle. Since /// starts with //, if // were evaluated first, the documentation comment marker would be incorrectly tokenized as a line comment followed by a slash."
},
{
"id": "avap_q46",
"question": "¿Cuál es la jerarquía de precedencia de operadores en AVAP, de menor a mayor?",
"ground_truth": "La jerarquía de precedencia en AVAP de menor a mayor es: or lógico, and lógico, not lógico, comparaciones (==, !=, <, >, <=, >=, in, is), suma y resta aritméticas, multiplicación/división/módulo, factor (unario), y potencia (**)."
},
{
"id": "avap_q47",
"question": "What are the three memory scope types in AVAP?",
"ground_truth": "AVAP uses a memory model based on three types of scopes: Global Scope (accessible from anywhere in the program), Main Local Scope (the main execution flow outside any function), and Function Scope (created independently for each function invocation)."
},
{
"id": "avap_q48",
"question": "¿Pueden las funciones en AVAP acceder a las variables del flujo principal?",
"ground_truth": "No. Las variables del Main Local Scope no son accesibles desde las funciones. Las funciones solo pueden acceder a su propio Function Scope y al Global Scope. Esto evita dependencias implícitas entre funciones y el flujo principal."
},
{
"id": "avap_q49",
"question": "What happens to Function Scope variables when a function finishes executing?",
"ground_truth": "When a function finishes executing, its Function Scope is destroyed. All variables created within the function, including parameters and intermediate results, cease to exist and are not visible from outside the function."
},
{
"id": "avap_q50",
"question": "¿Pueden las goroutines acceder al Main Local Scope en AVAP?",
"ground_truth": "No. Las goroutines siguen las mismas reglas de scope que una función normal. Por lo tanto, pueden acceder al Global Scope y a su propio Function Scope, pero no pueden acceder al Main Local Scope."
},
{
"id": "avap_q51",
"question": "What is the variable resolution order inside a function in AVAP?",
"ground_truth": "Inside a function, variable resolution follows this hierarchical order: first the Function Scope is checked, then the Global Scope. The Main Local Scope is not visible inside functions. If a variable does not exist in any visible scope, the engine produces a runtime error."
},
{
"id": "avap_q52",
"question": "¿Cuánto tiempo existe el Global Scope en AVAP?",
"ground_truth": "El Global Scope existe durante toda la vida del proceso del intérprete. Las variables globales actúan como estado compartido del programa y son visibles desde el flujo principal, desde todas las funciones y desde las goroutines."
},
{
"id": "avap_q53",
"question": "What token does the AVAP lexer produce for the end of a line?",
"ground_truth": "The AVAP lexer produces the EOL token for the end of a line. This token acts as a statement terminator and is generated by carriage return or line feed characters (\\r\\n, \\n, or \\r)."
},
{
"id": "avap_q54",
"question": "¿Qué elementos descarta el lexer de AVAP y no envía al parser?",
"ground_truth": "El lexer de AVAP descarta y no envía al parser los siguientes elementos: WHITESPACE (espacios y tabulaciones), LINE_COMMENT (comentarios de línea //), DOC_COMMENT (comentarios de documentación ///) y BLOCK_COMMENT (comentarios de bloque /* */)."
},
{
"id": "avap_q55",
"question": "What is the DEREF token in AVAP and when is it used?",
"ground_truth": "The DEREF token is the dollar sign ($) prefix used to dereference a variable, meaning it instructs the engine to look up the variable's value in the symbol table. For example, addVar(copy, $original) uses DEREF to access the value of the variable named 'original'."
},
{
"id": "avap_q56",
"question": "¿Cuál es el orden de precedencia léxica que debe seguir el lexer de AVAP?",
"ground_truth": "El lexer de AVAP debe aplicar el principio de máxima coincidencia en este orden: primero comentarios (/// antes que //, luego /* */), luego whitespace, palabras reservadas, identificadores, números flotantes, enteros, strings, operadores compuestos (**,==,<=,>=,!=), operadores simples y finalmente delimitadores."
},
{
"id": "avap_q57",
"question": "Why must ** be evaluated before * in the AVAP lexer?",
"ground_truth": "The ** operator must be evaluated before * because the lexer applies the longest-match principle. If * were evaluated first, the power operator ** would be incorrectly tokenized as two separate multiplication operators."
},
{
"id": "avap_q58",
"question": "¿Qué secuencias de escape soporta AVAP en los literales de cadena?",
"ground_truth": "AVAP soporta las siguientes secuencias de escape en literales de cadena: \\\" (comilla doble), \\' (comilla simple), \\\\ (barra invertida), \\n (salto de línea), \\t (tabulación), \\r (retorno de carro) y \\0 (carácter nulo)."
},
{
"id": "avap_q59",
"question": "Does a \\n inside a string literal act as a statement terminator in AVAP?",
"ground_truth": "No. A \\n inside a string literal is a data character, not a statement terminator. The physical EOL (end of line) is the only statement terminator in AVAP."
},
{
"id": "avap_q60",
"question": "¿Qué palabras están reservadas en AVAP y no pueden usarse como identificadores?",
"ground_truth": "Las palabras reservadas en AVAP incluyen: palabras de control de flujo (if, else, end, startLoop, endLoop, try, exception, return), declaración de funciones (function), concurrencia (go, gather), modularidad (include, import), operadores lógicos (and, or, not, in, is) y literales (True, False, None)."
},
{
"id": "avap_q61",
"question": "What is the BNF rule for a valid AVAP identifier?",
"ground_truth": "A valid AVAP identifier must start with a letter (a-z or A-Z) or an underscore, followed by zero or more letters, digits (0-9), or underscores. The formal rule is: [a-zA-Z_][a-zA-Z0-9_]*"
},
{
"id": "avap_q62",
"question": "¿Cómo se define un número flotante en la gramática léxica de AVAP?",
"ground_truth": "Un número flotante en AVAP se define como uno o más dígitos seguidos de un punto y cero o más dígitos, o como un punto seguido de uno o más dígitos. La regla formal es: [0-9]+\\.[0-9]* | \\.[0-9]+. Ejemplos válidos son 1.0, 3.14 y .5."
},
{
"id": "avap_q63",
"question": "What is the purpose of registerEndpoint's middleware list parameter?",
"ground_truth": "The middleware list parameter in registerEndpoint allows injecting a list of functions that execute before the main handler. These middleware functions are used to validate tokens and perform pre-processing before the main block executes."
},
{
"id": "avap_q64",
"question": "¿Qué diferencia hay entre una llamada a función global y una llamada a método en AVAP?",
"ground_truth": "Una llamada a función global (function_call_stmt) no tiene receptor de objeto y usa la sintaxis identificador(argumentos). Una llamada a método (method_call_stmt) se realiza sobre un objeto conector usando notación de punto, con la sintaxis identificador = identificador.identificador(argumentos)."
},
{
"id": "avap_q65",
"question": "What does ormCreateTable do in AVAP and what parameters does it require?",
"ground_truth": "ormCreateTable is a DDL command for creating tables in the connected database. It requires four parameters: fields (the field names), fieldsType (the data types of the fields), tableName (the name of the table to create), and varTarget (the variable to store the result)."
},
{
"id": "avap_q66",
"question": "¿Qué devuelve ormAccessSelect en AVAP?",
"ground_truth": "ormAccessSelect devuelve una lista de diccionarios, donde cada diccionario representa un registro recuperado de la tabla. El campo fields acepta * para recuperar todos los campos o una lista de campos específicos, y el selector es la cláusula WHERE que puede estar vacía."
},
{
"id": "avap_q67",
"question": "How does AVAP handle output if it has no print command?",
"ground_truth": "Since AVAP has no internal print commands, all data output is performed through the HTTP interface. The addResult command registers which variables will form part of the final JSON response body sent back to the HTTP client."
},
{
"id": "avap_q68",
"question": "¿Qué es el Main Local Scope en AVAP y cuándo desaparece?",
"ground_truth": "El Main Local Scope corresponde al flujo de ejecución principal del script, fuera de cualquier función. Las variables declaradas en este ámbito son locales del flujo principal, no son accesibles desde funciones ni goroutines, y desaparecen cuando finaliza la ejecución del script."
},
{
"id": "avap_q69",
"question": "What is the BNF structure of a try/exception block in AVAP?",
"ground_truth": "A try/exception block in AVAP has the structure: try() followed by a block, then exception(identifier) followed by another block, and finally end(). The identifier in the exception clause receives the error trace when a system failure occurs inside the try block."
},
{
"id": "avap_q70",
"question": "¿Puede una sentencia AVAP ocupar más de una línea física?",
"ground_truth": "No. AVAP es estrictamente orientado a líneas y no admite la partición de una instrucción en múltiples líneas. Cada instrucción lógica debe completarse en una única línea física de texto, y el EOL es el terminador absoluto de la instrucción."
},
{
"id": "avap_q71",
"question": "What is the syntax for declaring a function in AVAP?",
"ground_truth": "A function in AVAP is declared using the keyword 'function' followed by the function name, parentheses with an optional parameter list, and then the function body enclosed in curly braces {}. The formal syntax is: function identifier([param_list]) { block }."
},
{
"id": "avap_q72",
"question": "¿Qué hace variableFromJSON en AVAP?",
"ground_truth": "variableFromJSON parsea un objeto JSON en memoria y extrae el valor correspondiente a la clave especificada, almacenándolo en la variable destino. El acceso es directo por nombre de propiedad."
},
{
"id": "avap_q73",
"question": "What is the index base used by itemFromList in AVAP?",
"ground_truth": "itemFromList uses zero-based indexing (base 0) to access elements in a list. It safely extracts the element at the specified index position from the source list and stores it in the destination variable."
},
{
"id": "avap_q74",
"question": "¿Cuál es la diferencia entre el nivel léxico y el nivel sintáctico en AVAP?",
"ground_truth": "El nivel léxico (cubierto por el Apéndice X) produce tokens como IDENTIFIER, INTEGER, FLOAT, STRING, operadores, delimitadores, EOL y palabras reservadas. El nivel sintáctico consume esos tokens para construir el AST (árbol de sintaxis abstracta) según las reglas BNF de las Secciones I-IX."
},
{
"id": "avap_q75",
"question": "What comparison operators are supported in AVAP's structured if mode?",
"ground_truth": "In AVAP's structured if mode (Mode 1), the supported comparison operators are: == (equal), != (not equal), > (greater than), < (less than), >= (greater than or equal), and <= (less than or equal). The operator must be passed as a string enclosed in double quotes."
},
{
"id": "avap_q76",
"question": "¿Qué es un motor de evaluación híbrida en el contexto de AVAP?",
"ground_truth": "El motor de evaluación híbrida de AVAP permite combinar comandos declarativos con expresiones dinámicas. Cuando el intérprete lee una asignación como variable = expresión, resuelve cualquier operación matemática o lógica en tiempo real utilizando este motor subyacente."
},
{
"id": "avap_q77",
"question": "What tokens are produced by the AVAP lexer for comparison operators?",
"ground_truth": "The AVAP lexer produces the following tokens for comparison operators: EQ for ==, NEQ for !=, LT for <, GT for >, LTE for <=, and GTE for >=."
},
{
"id": "avap_q78",
"question": "¿Qué es un objeto proxy en el contexto de avapConnector?",
"ground_truth": "Cuando se instancia un avapConnector, la variable se convierte en un objeto proxy que encapsula las credenciales y el contexto del servicio de terceros. Este objeto expone métodos dinámicos mediante notación de punto, que son resueltos en tiempo de ejecución (runtime)."
},
{
"id": "avap_q79",
"question": "How does AVAP's gather command work when waiting for a goroutine result?",
"ground_truth": "The gather command pauses the main thread waiting for the result of a goroutine identified by the given identifier. If the specified timeout is exceeded, it cancels the wait and returns None. The syntax is: result = gather(identifier, timeout)."
},
{
"id": "avap_q80",
"question": "¿Qué ocurre si se intenta usar if(user.id, 10, '==') en AVAP?",
"ground_truth": "Esta expresión es inválida porque el Modo 1 del comando if no permite expresiones de acceso con punto (como user.id). Para comparar un valor extraído de una estructura, primero debe asignarse a una variable simple y luego usar esa variable en el if."
},
{
"id": "avap_q81",
"question": "What is the purpose of the _status system variable in AVAP?",
"ground_truth": "_status is a reserved system variable that allows explicitly defining the HTTP response status code. It is accessible and assignable from any scope, and can be set either through direct assignment (_status = 404) or through the addVar command (addVar(_status, 401))."
},
{
"id": "avap_q82",
"question": "¿Cómo se accede a los métodos de un objeto conector en AVAP?",
"ground_truth": "Los métodos de un objeto conector se acceden mediante notación de punto. Primero se instancia el conector con avapConnector('UUID'), y luego se invocan sus métodos dinámicos con la sintaxis objeto.metodo(argumentos). Los métodos son resueltos en tiempo de ejecución."
},
{
"id": "avap_q83",
"question": "What does the BNF rule for startLoop look like in AVAP?",
"ground_truth": "The BNF rule for startLoop is: startLoop(identifier, expression, expression) followed by a block and closed with endLoop(). The identifier is the counter variable, and the two expressions define the start and end of the iteration range."
},
{
"id": "avap_q84",
"question": "¿Qué tipo de datos devuelve getListLen en AVAP?",
"ground_truth": "getListLen calcula el número total de elementos contenidos en una lista y almacena el resultado como un entero en la variable destino. Este valor entero puede usarse para construir bucles de recorrido seguro."
},
{
"id": "avap_q85",
"question": "Can AVAP functions access variables declared in the main execution flow?",
"ground_truth": "No. AVAP functions cannot access variables from the Main Local Scope. Functions can only access their own Function Scope and the Global Scope. This design prevents implicit dependencies between functions and the main flow."
},
{
"id": "avap_q86",
"question": "¿Qué formato de zona horaria acepta getDateTime en AVAP?",
"ground_truth": "getDateTime acepta cualquier zona horaria reconocida por la librería pytz de Python. El parámetro zona_horaria debe ser un string con el nombre de la zona horaria en el formato estándar de pytz."
},
{
"id": "avap_q87",
"question": "What is the BNF production rule for an AVAP assignment statement?",
"ground_truth": "The BNF production rule for an assignment in AVAP is: assignment ::= identifier '=' expression. This means an assignment consists of an identifier on the left side, the equals sign, and an expression on the right side."
},
{
"id": "avap_q88",
"question": "¿Qué hace el comando ormCheckTable en AVAP?",
"ground_truth": "ormCheckTable verifica la existencia de una tabla en la base de datos conectada. Recibe el nombre de la tabla (tableName) y una variable destino (varTarget) donde almacena el resultado de la verificación."
},
{
"id": "avap_q89",
"question": "How does AVAP support dictionary literals in expressions?",
"ground_truth": "AVAP supports dictionary literals through the dict_display rule in the grammar, which uses curly braces containing key-datum pairs separated by commas. Each key-datum pair consists of an expression, a colon, and another expression."
},
{
"id": "avap_q90",
"question": "¿Cuál es la sintaxis BNF de la declaración return en AVAP?",
"ground_truth": "La sintaxis BNF de return en AVAP es: return_stmt ::= 'return(' [expression] ')'. La expresión es opcional, lo que permite usar return() sin valor para simplemente interrumpir el flujo de ejecución."
},
{
"id": "avap_q91",
"question": "What is the difference between a line comment and a documentation comment in AVAP?",
"ground_truth": "A line comment (//) simply ignores text until the end of the line and is discarded by the lexer. A documentation comment (///) is also discarded by the lexer but is intended to be used by code analyzers or IDEs to automatically generate technical documentation (Docstrings) from the source code."
},
{
"id": "avap_q92",
"question": "¿Qué operadores lógicos soporta AVAP en las expresiones?",
"ground_truth": "AVAP soporta los operadores lógicos and, or y not en las expresiones. Además, soporta los operadores in e is como operadores de comparación. En la jerarquía de precedencia, not tiene mayor precedencia que and, y and tiene mayor precedencia que or."
},
{
"id": "avap_q93",
"question": "What is the BNF rule for the addParam command in AVAP?",
"ground_truth": "The BNF rule for addParam is: addparam_cmd ::= 'addParam(' stringliteral ',' identifier ')'. The first argument is a string literal with the parameter name to look up in the HTTP request, and the second is the identifier (variable) where the value will be stored."
},
{
"id": "avap_q94",
"question": "¿Cómo se define un bloque de comentario multilínea en AVAP?",
"ground_truth": "Un bloque de comentario multilínea en AVAP se define usando /* para abrir y */ para cerrar. Puede abarcar múltiples líneas y todo su contenido es ignorado por el lexer. La regla léxica es: BLOCK_COMMENT ::= '/*' .*? '*/'."
},
{
"id": "avap_q95",
"question": "What is the purpose of the go command in AVAP's concurrency model?",
"ground_truth": "The go command creates a new isolated execution context (goroutine) for a function, allowing the server to process long I/O operations without blocking the main thread. It returns a unique identifier that must be saved to interact with the thread later, typically using the gather command."
},
{
"id": "avap_q96",
"question": "¿Qué restricción tiene el Modo 1 del if() respecto a expresiones de acceso?",
"ground_truth": "El Modo 1 del if() no permite el uso de expresiones de acceso como data.user o list[0] como argumentos. Si se necesita comparar un valor extraído de una estructura de datos, primero debe asignarse a una variable simple y luego usar esa variable en el if."
},
{
"id": "avap_q97",
"question": "What does the AVAP documentation say about the relationship between the lexer and the parser?",
"ground_truth": "The lexer (Appendix X) operates at the lexical level, transforming source code into a sequence of tokens such as IDENTIFIER, INTEGER, FLOAT, STRING, operators, delimiters, EOL, and reserved words. The parser (Sections I-IX) operates at the syntactic level, consuming those tokens to build the AST according to the BNF grammar rules."
},
{
"id": "avap_q98",
"question": "¿Qué ocurre si una variable no existe en ningún scope visible dentro de una función en AVAP?",
"ground_truth": "Si una variable no existe en los scopes visibles (Function Scope y Global Scope) dentro de una función, el motor de ejecución produce un error de ejecución (runtime error)."
},
{
"id": "avap_q99",
"question": "What is the BNF rule for the gather command in AVAP?",
"ground_truth": "The BNF rule for gather is: gather_stmt ::= identifier '=' 'gather(' identifier [',' expression] ')'. The first identifier stores the result, the second identifier is the goroutine handle, and the optional expression is the timeout value."
},
{
"id": "avap_q100",
"question": "¿Cuál es la diferencia entre ormAccessInsert y ormAccessUpdate en AVAP?",
"ground_truth": "ormAccessInsert realiza la inserción parametrizada de nuevos registros en una tabla y tiene la firma ormAccessInsert(fieldsValues, tableName, varTarget). ormAccessUpdate modifica registros existentes y tiene la firma ormAccessUpdate(fields, fieldsValues, tableName, selector, varTarget), donde el selector es obligatorio para delimitar qué registros se modifican."
},
{
"id": "asignacion_booleana_q1",
"question": "How do you evaluate a numeric condition and return a boolean result in AVAP?",
"ground_truth": "nivel = 5\nes_admin = nivel >= 10\naddResult(es_admin)"
},
{
"id": "asignacion_booleana_q2",
"question": "¿Cómo se asigna el resultado de una comparación a una variable y se devuelve como respuesta?",
"ground_truth": "es_admin = nivel >= 10\naddResult(es_admin)"
},
{
"id": "asignacion_matematica_q1",
"question": "How do you calculate a subtotal with tax and return the total in AVAP?",
"ground_truth": "subtotal = 150.50\niva = subtotal * 0.21\ntotal = subtotal + iva\naddResult(total)"
},
{
"id": "asignacion_matematica_q2",
"question": "¿Cómo se encadenan operaciones aritméticas sobre variables numéricas en AVAP?",
"ground_truth": "subtotal = 150.50\niva = subtotal * 0.21\ntotal = subtotal + iva\naddResult(total)"
},
{
"id": "bucle_1_10_q1",
"question": "How do you build a JSON object dynamically inside a loop in AVAP?",
"ground_truth": "startLoop(i,1,10)\n item = \"item_%s\" % i\n AddvariableToJSON(item,'valor_generado',mi_json)\nendLoop()\naddResult(mi_json)"
},
{
"id": "bucle_1_10_q2",
"question": "¿Cómo se itera un número fijo de veces y se agrega una propiedad a un objeto JSON en cada iteración?",
"ground_truth": "startLoop(i,1,10)\n item = \"item_%s\" % i\n AddvariableToJSON(item,'valor_generado',mi_json)\nendLoop()\naddResult(mi_json)"
},
{
"id": "bucle_longitud_de_datos_q1",
"question": "How do you iterate over a list using its length as the loop bound in AVAP?",
"ground_truth": "registros = ['1','2','3']\ngetListLen(registros, total)\ncontador = 0\nstartLoop(idx, 0, 2)\n actual = registros[int(idx)]\nendLoop()\naddResult(actual)"
},
{
"id": "bucle_longitud_de_datos_q2",
"question": "¿Cómo se accede a elementos de una lista por índice dentro de un bucle en AVAP?",
"ground_truth": "startLoop(idx, 0, 2)\n actual = registros[int(idx)]\nendLoop()\naddResult(actual)"
},
{
"id": "calculo_de_expiracion_q1",
"question": "How do you calculate a future date by adding seconds to the current time in AVAP?",
"ground_truth": "getDateTime(\"\", 86400, \"UTC\", expira)\naddResult(expira)"
},
{
"id": "calculo_de_expiracion_q2",
"question": "¿Cómo se obtiene la fecha de expiración sumando un día a la hora actual en UTC?",
"ground_truth": "getDateTime(\"\", 86400, \"UTC\", expira)\naddResult(expira)"
},
{
"id": "captura_de_id_q1",
"question": "How do you capture a query parameter and return it directly as a response in AVAP?",
"ground_truth": "addParam(\"client_id\", id_interno)\naddResult(id_interno)"
},
{
"id": "captura_de_id_q2",
"question": "¿Cómo se lee un parámetro de entrada llamado client_id y se incluye en la respuesta?",
"ground_truth": "addParam(\"client_id\", id_interno)\naddResult(id_interno)"
},
{
"id": "captura_de_listas_multiples_q1",
"question": "How do you capture multiple occurrences of a URL parameter as a list in AVAP?",
"ground_truth": "getQueryParamList(\"lista_correos\", lista_correos)\naddResult(lista_correos)"
},
{
"id": "captura_de_listas_multiples_q2",
"question": "¿Cómo se capturan varios valores del mismo parámetro de URL y se devuelven como lista?",
"ground_truth": "addParam(\"emails\", emails)\ngetQueryParamList(\"lista_correos\", lista_correos)\naddResult(lista_correos)"
},
{
"id": "comparacion_simple_q1",
"question": "How do you read a parameter and conditionally assign a message based on its value in AVAP?",
"ground_truth": "addParam(\"lang\", l)\nif(l, \"es\", \"=\")\n addVar(msg, \"Hola\")\nend()\naddResult(msg)"
},
{
"id": "comparacion_simple_q2",
"question": "¿Cómo se usa if() en Modo 1 para comparar un parámetro con un string literal?",
"ground_truth": "if(l, \"es\", \"=\")\n addVar(msg, \"Hola\")\nend()\naddResult(msg)"
},
{
"id": "concatenacion_dinamica_q1",
"question": "How do you build a dynamic log message by interpolating a variable into a string in AVAP?",
"ground_truth": "nombre = \"Sistema\"\nlog = \"Evento registrado por: %s\" % nombre\naddResult(log)"
},
{
"id": "concatenacion_dinamica_q2",
"question": "¿Cómo se construye un string dinámico usando el operador % con una variable en AVAP?",
"ground_truth": "log = \"Evento registrado por: %s\" % nombre\naddResult(log)"
},
{
"id": "construccion_dinamica_de_objeto_q1",
"question": "How do you inject a variable key-value pair into a JSON object dynamically in AVAP?",
"ground_truth": "datos_cliente = \"datos\"\naddVar(clave, \"cliente_vip\")\nAddvariableToJSON(clave, datos_cliente, mi_json_final)\naddResult(mi_json_final)"
},
{
"id": "construccion_dinamica_de_objeto_q2",
"question": "¿Cómo se usa AddvariableToJSON con una clave almacenada en variable para construir un objeto JSON?",
"ground_truth": "addVar(clave, \"cliente_vip\")\nAddvariableToJSON(clave, datos_cliente, mi_json_final)\naddResult(mi_json_final)"
},
{
"id": "contador_de_parametros_q1",
"question": "How do you count the number of elements in a list received as a parameter in AVAP?",
"ground_truth": "addParam(\"data_list\", mi_lista)\ngetListLen(mi_lista, cantidad)\naddResult(cantidad)"
},
{
"id": "contador_de_parametros_q2",
"question": "¿Cómo se obtiene la longitud de una lista capturada desde la petición HTTP?",
"ground_truth": "addParam(\"data_list\", mi_lista)\ngetListLen(mi_lista, cantidad)\naddResult(cantidad)"
},
{
"id": "conversion_timestamp_legible_q1",
"question": "How do you convert a Unix epoch timestamp to a human-readable date string in AVAP?",
"ground_truth": "stampToDatetime(1708726162, \"%d/%m/%Y\", 0, fecha_human)\naddResult(fecha_human)"
},
{
"id": "conversion_timestamp_legible_q2",
"question": "¿Cómo se transforma un valor epoch en una fecha con formato día/mes/año en AVAP?",
"ground_truth": "stampToDatetime(1708726162, \"%d/%m/%Y\", 0, fecha_human)\naddResult(fecha_human)"
},
{
"id": "else_estandar_q1",
"question": "How do you use an if-else block to set a boolean variable based on a numeric condition in AVAP?",
"ground_truth": "addParam(\"sal_par\",saldo)\nif(saldo, 0, \">\")\n permitir = True\nelse()\n permitir = False\nend()\naddResult(permitir)"
},
{
"id": "else_estandar_q2",
"question": "¿Cómo se implementa una rama else() para asignar False cuando una condición no se cumple?",
"ground_truth": "if(saldo, 0, \">\")\n permitir = True\nelse()\n permitir = False\nend()\naddResult(permitir)"
},
{
"id": "expresion_compleja_q1",
"question": "How do you evaluate a complex multi-condition expression using if() Mode 2 in AVAP?",
"ground_truth": "if(None, None, \" user_type == 'VIP' or compras > 100\")\n addVar(descuento, 0.20)\nend()\naddResult(descuento)"
},
{
"id": "expresion_compleja_q2",
"question": "¿Cómo se capturan dos parámetros y se evalúan juntos en una expresión libre con if(None, None, ...)?",
"ground_truth": "addParam(\"userrype\", user_type)\naddParam(\"sells\", compras)\nif(None, None, \" user_type == 'VIP' or compras > 100\")\n addVar(descuento, 0.20)\nend()\naddResult(descuento)"
},
{
"id": "fecha_para_base_de_datos_q1",
"question": "How do you get the current datetime formatted for SQL storage in a specific timezone in AVAP?",
"ground_truth": "getDateTime(\"%Y-%m-%d %H:%M:%S\", 0, \"Europe/Madrid\", sql_date)\naddResult(sql_date)"
},
{
"id": "fecha_para_base_de_datos_q2",
"question": "¿Cómo se obtiene la fecha y hora actual en formato ISO para insertar en base de datos con zona horaria de Madrid?",
"ground_truth": "getDateTime(\"%Y-%m-%d %H:%M:%S\", 0, \"Europe/Madrid\", sql_date)\naddResult(sql_date)"
},
{
"id": "funcion_de_suma_q1",
"question": "How do you define and call a function that adds two numbers and returns the result in AVAP?",
"ground_truth": "function suma(a, b){\n total = a + b\n return(total)\n }\nresultado = suma(10, 20)\naddResult(resultado)"
},
{
"id": "funcion_de_suma_q2",
"question": "¿Cómo se declara una función con parámetros, se realiza un cálculo interno y se retorna el valor?",
"ground_truth": "function suma(a, b){\n total = a + b\n return(total)\n }"
},
{
"id": "funcion_validacion_acceso_q1",
"question": "How do you write a function that validates a token and returns a boolean access result in AVAP?",
"ground_truth": " function es_valido(token){\n response = False\n if(token, \"SECRET\", \"=\")\n response = True\n end()\n return(response)\n }\nautorizado = es_valido(\"SECRET\")\naddResult(autorizado)"
},
{
"id": "funcion_validacion_acceso_q2",
"question": "¿Cómo se usa un condicional dentro de una función para cambiar el valor de retorno según el parámetro recibido?",
"ground_truth": " function es_valido(token){\n response = False\n if(token, \"SECRET\", \"=\")\n response = True\n end()\n return(response)\n }"
},
{
"id": "generador_de_tokens_aleatorios_q1",
"question": "How do you generate a random alphanumeric token of a fixed length in AVAP?",
"ground_truth": "randomString(\"[A-Z]\\d\", 32, token_seguridad)\naddResult(token_seguridad)"
},
{
"id": "generador_de_tokens_aleatorios_q2",
"question": "¿Cómo se genera una cadena aleatoria de 32 caracteres con un patrón específico para usar como token de seguridad?",
"ground_truth": "randomString(\"[A-Z]\\d\", 32, token_seguridad)\naddResult(token_seguridad)"
},
{
"id": "hash_SHA256_para_integridad_q1",
"question": "How do you compute a SHA-256 hash of a string and return it as a checksum in AVAP?",
"ground_truth": "encodeSHA256(\"payload_data\", checksum)\naddResult(checksum)"
},
{
"id": "hash_SHA256_para_integridad_q2",
"question": "¿Cómo se genera un hash irreversible de un dato para verificar integridad en AVAP?",
"ground_truth": "encodeSHA256(\"payload_data\", checksum)\naddResult(checksum)"
},
{
"id": "hello_world_q1",
"question": "How do you register a GET endpoint and return a greeting message in AVAP?",
"ground_truth": "registerEndpoint(\"/hello_world\",\"GET\",[],\"HELLO_WORLD\",main,result)\naddVar(name,\"Alberto\")\nresult = \"Hello,\" + name \naddResult(result)"
},
{
"id": "hello_world_q2",
"question": "¿Cómo se registra un endpoint HTTP y se construye una respuesta concatenando un nombre fijo?",
"ground_truth": "registerEndpoint(\"/hello_world\",\"GET\",[],\"HELLO_WORLD\",main,result)\naddVar(name,\"Alberto\")\nresult = \"Hello,\" + name \naddResult(result)"
},
{
"id": "hola_mundo_q1",
"question": "How do you assign a string literal to a variable and return it as the API response in AVAP?",
"ground_truth": "addVar(mensaje, \"Hola mundo desde AVAP\")\naddResult(mensaje)"
},
{
"id": "hola_mundo_q2",
"question": "¿Cuál es la forma mínima de devolver un mensaje de texto fijo como respuesta en AVAP?",
"ground_truth": "addVar(mensaje, \"Hola mundo desde AVAP\")\naddResult(mensaje)"
},
{
"id": "if_desigualdad_q1",
"question": "How do you compare a captured parameter against an existing variable using the inequality operator in AVAP?",
"ground_truth": "addParam(\"password\",pass_nueva)\npass_antigua = \"password\"\nif(pass_nueva, pass_antigua, \"!=\")\n addVar(cambio, \"Contraseña actualizada\")\nend()\naddResult(cambio)"
},
{
"id": "if_desigualdad_q2",
"question": "¿Cómo se detecta que una contraseña nueva es diferente a la antigua y se registra el cambio?",
"ground_truth": "if(pass_nueva, pass_antigua, \"!=\")\n addVar(cambio, \"Contraseña actualizada\")\nend()\naddResult(cambio)"
},
{
"id": "limpieza_de_strings_q1",
"question": "How do you replace a substring within a string variable and return the cleaned result in AVAP?",
"ground_truth": "replace(\"REF_1234_OLD\",\"OLD\", \"NEW\", ref_actualizada)\naddResult(ref_actualizada)"
},
{
"id": "limpieza_de_strings_q2",
"question": "¿Cómo se normaliza una referencia reemplazando una parte del texto con un nuevo valor en AVAP?",
"ground_truth": "replace(\"REF_1234_OLD\",\"OLD\", \"NEW\", ref_actualizada)\naddResult(ref_actualizada)"
},
{
"id": "manejo_error_sql_critico_q1",
"question": "How do you catch a database error and return a 500 status with an error message in AVAP?",
"ground_truth": "try()\n ormDirect(\"UPDATE table_inexistente SET a=1\", res)\nexception(e)\n addVar(_status, 500)\n addVar(error_msg, \"Error de base de datos\")\n addResult(error_msg)\nend()"
},
{
"id": "manejo_error_sql_critico_q2",
"question": "¿Cómo se usa try/exception para manejar un fallo en una sentencia SQL directa y responder con código 500?",
"ground_truth": "try()\n ormDirect(\"UPDATE table_inexistente SET a=1\", res)\nexception(e)\n addVar(_status, 500)\n addVar(error_msg, \"Error de base de datos\")\n addResult(error_msg)\nend()"
},
{
"id": "obtencion_timestamp_q1",
"question": "How do you get the current UTC timestamp and return it as the API response in AVAP?",
"ground_truth": "getDateTime(\"\", 0, \"UTC\", ahora)\naddResult(ahora)"
},
{
"id": "obtencion_timestamp_q2",
"question": "¿Cómo se obtiene la fecha y hora actual sin modificar el offset y se devuelve en la respuesta?",
"ground_truth": "getDateTime(\"\", 0, \"UTC\", ahora)\naddResult(ahora)"
},
{
"id": "ormAccessCreate_q1",
"question": "How do you check if a table exists and create it only if it does not in AVAP?",
"ground_truth": "ormCheckTable(tabla_pruebas,resultado_comprobacion)\nif(resultado_comprobacion,False,'==')\n ormCreateTable(\"username,age\",'VARCHAR,INTEGER',tabla_pruebas,resultado_creacion)\nend()\naddResult(resultado_comprobacion)\naddResult(resultado_creacion)"
},
{
"id": "ormAccessCreate_q2",
"question": "¿Cómo se usa ormCheckTable junto con un condicional para crear una tabla solo cuando no existe?",
"ground_truth": "ormCheckTable(tabla_pruebas,resultado_comprobacion)\nif(resultado_comprobacion,False,'==')\n ormCreateTable(\"username,age\",'VARCHAR,INTEGER',tabla_pruebas,resultado_creacion)\nend()"
},
{
"id": "paginacion_dinamica_recursos_q1",
"question": "How do you implement dynamic pagination by reading page and size parameters and slicing a list in AVAP?",
"ground_truth": "addParam(\"page\", p)\naddParam(\"size\", s)\nregistros = [\"u1\", \"u2\", \"u3\", \"u4\", \"u5\", \"u6\"]\noffset = int(p) * int(s)\nlimite = offset + int(s)\ncontador = 0\naddResult(offset)\naddResult(limite)\nstartLoop(i, 2, limite)\n actual = registros[int(i)]\n titulo = \"reg_%s\" % i\n AddvariableToJSON(titulo, actual, pagina_json)\nendLoop()\naddResult(pagina_json)"
},
{
"id": "paginacion_dinamica_recursos_q2",
"question": "¿Cómo se calculan el offset y el límite de paginación a partir de parámetros de entrada y se construye un JSON con los resultados?",
"ground_truth": "offset = int(p) * int(s)\nlimite = offset + int(s)\nstartLoop(i, 2, limite)\n actual = registros[int(i)]\n titulo = \"reg_%s\" % i\n AddvariableToJSON(titulo, actual, pagina_json)\nendLoop()\naddResult(pagina_json)"
},
{
"id": "referencia_por_valor_q1",
"question": "How do you copy the value of one variable into another using the dereference operator in AVAP?",
"ground_truth": "addVar(base, 1000)\naddVar(copia, $base)\naddResult(copia)"
},
{
"id": "referencia_por_valor_q2",
"question": "¿Cómo se usa el operador $ para pasar el valor de una variable como argumento a addVar?",
"ground_truth": "addVar(copia, $base)\naddResult(copia)"
},
{
"id": "respuesta_multiple_q1",
"question": "How do you include multiple variables in the JSON response body in AVAP?",
"ground_truth": "addVar(code, 200)\naddVar(status, \"Success\")\naddResult(code)\naddResult(status)"
},
{
"id": "respuesta_multiple_q2",
"question": "¿Cómo se agregan varios campos a la respuesta HTTP usando addResult múltiples veces?",
"ground_truth": "addResult(code)\naddResult(status)"
},
{
"id": "salida_bucle_correcta_q1",
"question": "How do you exit a loop early when a specific value is found during iteration in AVAP?",
"ground_truth": "encontrado = False\nstartLoop(i, 1, 10)\n if(i, 5, \"==\")\n encontrado = True\n i = 11 \n end()\nendLoop()\naddResult(encontrado)"
},
{
"id": "salida_bucle_correcta_q2",
"question": "¿Cómo se usa un condicional dentro de un startLoop para detectar un valor y detener la iteración?",
"ground_truth": "startLoop(i, 1, 10)\n if(i, 5, \"==\")\n encontrado = True\n i = 11 \n end()\nendLoop()"
},
{
"id": "try_catch_request_q1",
"question": "How do you wrap an external HTTP GET request in a try-exception block to handle network errors in AVAP?",
"ground_truth": "try()\n RequestGet(\"https://api.test.com/data\", 0, 0, respuesta, None)\nexception(e)\n addVar(error_trace, e)\n addResult(error_trace)\nend()"
},
{
"id": "try_catch_request_q2",
"question": "¿Cómo se captura la traza de error de una petición GET fallida y se devuelve en la respuesta?",
"ground_truth": "exception(e)\n addVar(error_trace, e)\n addResult(error_trace)\nend()"
},
{
"id": "validacion_de_nulo_q1",
"question": "How do you return a 403 error when a required API key parameter is missing in AVAP?",
"ground_truth": "addParam(\"api_key\", key)\nif(key, None, \"==\")\n addVar(_status, 403)\n addVar(error, \"Acceso denegado: falta API KEY\")\n addResult(error)\nend()"
},
{
"id": "validacion_de_nulo_q2",
"question": "¿Cómo se verifica que un parámetro no sea nulo y se responde con un código de estado HTTP de error si falta?",
"ground_truth": "if(key, None, \"==\")\n addVar(_status, 403)\n addVar(error, \"Acceso denegado: falta API KEY\")\n addResult(error)\nend()"
},
{
"id": "validacion_in_pertenece_a_lista_q1",
"question": "How do you check if a role parameter belongs to a set of allowed values using a free expression in AVAP?",
"ground_truth": "addParam(\"rol\", r)\nacceso = False\n\nif(None, None, \"r == 'admin' or r == 'editor' or r == 'root'\")\n acceso = True\nend()\n\naddResult(acceso)"
},
{
"id": "validacion_in_pertenece_a_lista_q2",
"question": "¿Cómo se usa if() en Modo 2 para evaluar si una variable pertenece a un conjunto de valores permitidos?",
"ground_truth": "if(None, None, \"r == 'admin' or r == 'editor' or r == 'root'\")\n acceso = True\nend()\n\naddResult(acceso)"
}
]

View File

@ -0,0 +1,32 @@
[
{
"id": "GD-001",
"category": "RETRIEVAL",
"question": "What is AVAP and what is it designed for?",
"ground_truth": "AVAP (Advanced Virtual API Programming) is a Turing-complete Domain-Specific Language (DSL) architecturally designed for the secure, concurrent, and deterministic orchestration of microservices and HTTP I/O. It is not a general-purpose language; its hybrid engine and strict grammar are optimized for fast processing of HTTP transactions, in-memory data manipulation, and interaction with external connectors. AVAP does not have internal print commands — all data output is performed through the HTTP interface using commands like addResult()."
},
{
"id": "GD-002",
"category": "RETRIEVAL",
"question": "How does AVAP handle conditional logic? What commands are used and how are blocks closed?",
"ground_truth": "AVAP uses a mixed structural grammar for conditional logic, combining keyword fluidity with strict mathematical closures. The if() / else() / end() structure evaluates a logical or comparison expression. Every conditional block requires a mandatory end() closing statement. The if() command compares two values using a comparator operator (e.g., '==', '!=', '>', '<', '>=', '<='). An optional else() block handles the false branch. Example: if(saldo, 0, \">\") executes the true branch when the variable 'saldo' is greater than zero, otherwise the else() block runs, and end() closes the structure."
},
{
"id": "GD-003",
"category": "CODE_GENERATION",
"question": "Write an AVAP script that reads a 'password' parameter, generates a SHA-256 hash of it, and returns the hash.",
"ground_truth": "The following AVAP script reads a 'password' query parameter, hashes it using SHA-256 via encodeSHA256(), and exposes the result via addResult():\n\naddParam(\"password\", password)\nencodeSHA256(password, hashed_password)\naddResult(hashed_password)\n\nKey commands used:\n- addParam(\"password\", password): reads the 'password' HTTP parameter into the variable 'password'.\n- encodeSHA256(password, hashed_password): computes the SHA-256 hash of the input and stores the 64-character hex digest in 'hashed_password'.\n- addResult(hashed_password): adds 'hashed_password' to the HTTP JSON response body."
},
{
"id": "GD-004",
"category": "CODE_GENERATION",
"question": "Show an AVAP script that loops from 1 to 5, builds a JSON object with each iteration index as a key, and returns it.",
"ground_truth": "The following AVAP script iterates from 1 to 5 using startLoop/endLoop, dynamically builds a JSON object using AddvariableToJSON() on each iteration, and returns the result:\n\naddVar(mi_json, \"{}\")\nstartLoop(i, 1, 5)\n item = \"item_%s\" % i\n AddvariableToJSON(item, \"valor_generado\", mi_json)\nendLoop()\naddResult(mi_json)\n\nKey commands used:\n- addVar(mi_json, \"{}\"): initializes an empty JSON object.\n- startLoop(i, 1, 5) / endLoop(): iterates the variable 'i' from 1 to 5 inclusive.\n- AddvariableToJSON(item, \"valor_generado\", mi_json): inserts each generated key-value pair into the JSON object.\n- addResult(mi_json): exposes the final JSON in the HTTP response."
},
{
"id": "GD-005",
"category": "RETRIEVAL",
"question": "How does AVAP support external HTTP calls? What commands are available and how is timeout handled?",
"ground_truth": "AVAP provides two commands for making external HTTP calls: RequestPost and RequestGet. To avoid blocking threads due to network latency, AVAP requires a mandatory timeout parameter (in milliseconds) for both commands. If the timeout is exceeded, the destination variable receives None. RequestPost(url, querystring, headers, body, destino, timeout) executes an HTTP POST and stores the response in 'destino'. RequestGet(url, querystring, headers, destino, timeout) executes an HTTP GET similarly. Both commands are part of AVAP's Section V (Third-Party Connectors and External HTTP Requests) and allow calling external APIs without additional drivers."
}
]

View File

@ -20,6 +20,7 @@ logger = logging.getLogger(__name__)
session_store: dict[str, list] = defaultdict(list)
def format_context(docs):
chunks = []
for i, doc in enumerate(docs, 1):
@ -142,6 +143,89 @@ def hybrid_search_native(es_client, embeddings, query, index_name, k=8):
logger.info(f"[hybrid] RRF -> {len(docs)} final docs")
return docs
def _build_classify_prompt(question: str, history_text: str, selected_text: str) -> str:
prompt = (
CLASSIFY_PROMPT_TEMPLATE
.replace("{history}", history_text)
.replace("{message}", question)
)
if selected_text:
editor_section = (
"\n\n<editor_selection>\n"
"The user currently has the following AVAP code selected in their editor. "
"If the question refers to 'this', 'here', 'the code above', or similar, "
"it is about this selection.\n"
f"{selected_text}\n"
"</editor_selection>"
)
prompt = prompt.replace(
f"<user_message>{question}</user_message>",
f"{editor_section}\n\n<user_message>{question}</user_message>"
)
return prompt
def _build_reformulate_query(question: str, selected_text: str) -> str:
if not selected_text:
return question
return f"{selected_text}\n\nUser question about the above: {question}"
def _build_generation_prompt(template_prompt: SystemMessage, context: str,
editor_content: str, selected_text: str,
extra_context: str) -> SystemMessage:
base = template_prompt.content.format(context=context)
sections = []
if selected_text:
sections.append(
"<selected_code>\n"
"The user has the following AVAP code selected in their editor. "
"Ground your answer in this code first. "
"Use the RAG context as supplementary reference only.\n"
f"{selected_text}\n"
"</selected_code>"
)
if editor_content:
sections.append(
"<editor_file>\n"
"Full content of the active file open in the editor "
"(use for broader context if needed):\n"
f"{editor_content}\n"
"</editor_file>"
)
if extra_context:
sections.append(
"<extra_context>\n"
f"{extra_context}\n"
"</extra_context>"
)
if sections:
editor_block = "\n\n".join(sections)
base = editor_block + "\n\n" + base
return SystemMessage(content=base)
def _parse_query_type(raw: str) -> tuple[str, bool]:
parts = raw.strip().upper().split()
query_type = "RETRIEVAL"
use_editor = False
if parts:
first = parts[0]
if first.startswith("CODE_GENERATION") or "CODE" in first:
query_type = "CODE_GENERATION"
elif first.startswith("CONVERSATIONAL"):
query_type = "CONVERSATIONAL"
if len(parts) > 1 and parts[1] == "EDITOR":
use_editor = True
return query_type, use_editor
def build_graph(llm, embeddings, es_client, index_name):
def _persist(state: AgentState, response: BaseMessage):
@ -156,43 +240,37 @@ def build_graph(llm, embeddings, es_client, index_name):
user_msg.get("content", "")
if isinstance(user_msg, dict) else "")
history_msgs = messages[:-1]
selected_text = state.get("selected_text", "")
if not history_msgs:
prompt_content = (
CLASSIFY_PROMPT_TEMPLATE
.replace("{history}", "(no history)")
.replace("{message}", question)
)
resp = llm.invoke([SystemMessage(content=prompt_content)])
raw = resp.content.strip().upper()
query_type = _parse_query_type(raw)
logger.info(f"[classify] no historic content raw='{raw}' -> {query_type}")
return {"query_type": query_type}
history_text = format_history_for_classify(history_msgs) if history_msgs else "(no history)"
prompt_content = _build_classify_prompt(question, history_text, selected_text)
history_text = format_history_for_classify(history_msgs)
prompt_content = (
CLASSIFY_PROMPT_TEMPLATE
.replace("{history}", history_text)
.replace("{message}", question)
)
resp = llm.invoke([SystemMessage(content=prompt_content)])
raw = resp.content.strip().upper()
query_type = _parse_query_type(raw)
logger.info(f"[classify] raw='{raw}' -> {query_type}")
return {"query_type": query_type}
def _parse_query_type(raw: str) -> str:
if raw.startswith("CODE_GENERATION") or "CODE" in raw:
return "CODE_GENERATION"
if raw.startswith("CONVERSATIONAL"):
return "CONVERSATIONAL"
return "RETRIEVAL"
raw = resp.content.strip().upper()
query_type, use_editor_ctx = _parse_query_type(raw)
logger.info(f"[classify] selected={bool(selected_text)} raw='{raw}' -> {query_type} editor={use_editor_ctx}")
return {"query_type": query_type, "use_editor_context": use_editor_ctx}
def reformulate(state: AgentState) -> AgentState:
user_msg = state["messages"][-1]
resp = llm.invoke([REFORMULATE_PROMPT, user_msg])
selected_text = state.get("selected_text", "")
question = getattr(user_msg, "content",
user_msg.get("content", "")
if isinstance(user_msg, dict) else "")
anchor = _build_reformulate_query(question, selected_text)
if selected_text:
from langchain_core.messages import HumanMessage as HM
resp = llm.invoke([REFORMULATE_PROMPT, HM(content=anchor)])
else:
query_type = state.get("query_type", "RETRIEVAL")
mode_hint = HumanMessage(content=f"[MODE: {query_type}]\n{question}")
resp = llm.invoke([REFORMULATE_PROMPT, mode_hint])
reformulated = resp.content.strip()
logger.info(f"[reformulate] -> '{reformulated}'")
logger.info(f"[reformulate] selected={bool(selected_text)} -> '{reformulated}'")
return {"reformulated_query": reformulated}
def retrieve(state: AgentState) -> AgentState:
@ -209,8 +287,13 @@ def build_graph(llm, embeddings, es_client, index_name):
return {"context": context}
def generate(state):
prompt = SystemMessage(
content=GENERATE_PROMPT.content.format(context=state["context"])
use_editor = state.get("use_editor_context", False)
prompt = _build_generation_prompt(
template_prompt = GENERATE_PROMPT,
context = state.get("context", ""),
editor_content = state.get("editor_content", "") if use_editor else "",
selected_text = state.get("selected_text", "") if use_editor else "",
extra_context = state.get("extra_context", ""),
)
resp = llm.invoke([prompt] + state["messages"])
logger.info(f"[generate] {len(resp.content)} chars")
@ -218,8 +301,13 @@ def build_graph(llm, embeddings, es_client, index_name):
return {"messages": [resp]}
def generate_code(state):
prompt = SystemMessage(
content=CODE_GENERATION_PROMPT.content.format(context=state["context"])
use_editor = state.get("use_editor_context", False)
prompt = _build_generation_prompt(
template_prompt = CODE_GENERATION_PROMPT,
context = state.get("context", ""),
editor_content = state.get("editor_content", "") if use_editor else "",
selected_text = state.get("selected_text", "") if use_editor else "",
extra_context = state.get("extra_context", ""),
)
resp = llm.invoke([prompt] + state["messages"])
logger.info(f"[generate_code] {len(resp.content)} chars")
@ -228,7 +316,7 @@ def build_graph(llm, embeddings, es_client, index_name):
def respond_conversational(state):
resp = llm.invoke([CONVERSATIONAL_PROMPT] + state["messages"])
logger.info("[conversational] from comversation")
logger.info("[conversational] from conversation")
_persist(state, resp)
return {"messages": [resp]}
@ -254,9 +342,9 @@ def build_graph(llm, embeddings, es_client, index_name):
"classify",
route_by_type,
{
"RETRIEVAL": "reformulate",
"RETRIEVAL": "reformulate",
"CODE_GENERATION": "reformulate",
"CONVERSATIONAL": "respond_conversational",
"CONVERSATIONAL": "respond_conversational",
}
)
@ -266,7 +354,7 @@ def build_graph(llm, embeddings, es_client, index_name):
"retrieve",
route_after_retrieve,
{
"generate": "generate",
"generate": "generate",
"generate_code": "generate_code",
}
)
@ -284,46 +372,39 @@ def build_prepare_graph(llm, embeddings, es_client, index_name):
messages = state["messages"]
user_msg = messages[-1]
question = getattr(user_msg, "content",
user_msg.get("content", "")
if isinstance(user_msg, dict) else "")
history_msgs = messages[:-1]
user_msg.get("content", "")
if isinstance(user_msg, dict) else "")
history_msgs = messages[:-1]
selected_text = state.get("selected_text", "")
if not history_msgs:
prompt_content = (
CLASSIFY_PROMPT_TEMPLATE
.replace("{history}", "(no history)")
.replace("{message}", question)
)
resp = llm.invoke([SystemMessage(content=prompt_content)])
raw = resp.content.strip().upper()
query_type = _parse_query_type(raw)
logger.info(f"[prepare/classify] no history raw='{raw}' -> {query_type}")
return {"query_type": query_type}
history_text = format_history_for_classify(history_msgs) if history_msgs else "(no history)"
prompt_content = _build_classify_prompt(question, history_text, selected_text)
history_text = format_history_for_classify(history_msgs)
prompt_content = (
CLASSIFY_PROMPT_TEMPLATE
.replace("{history}", history_text)
.replace("{message}", question)
)
resp = llm.invoke([SystemMessage(content=prompt_content)])
raw = resp.content.strip().upper()
query_type = _parse_query_type(raw)
logger.info(f"[prepare/classify] raw='{raw}' -> {query_type}")
return {"query_type": query_type}
def _parse_query_type(raw: str) -> str:
if raw.startswith("CODE_GENERATION") or "CODE" in raw:
return "CODE_GENERATION"
if raw.startswith("CONVERSATIONAL"):
return "CONVERSATIONAL"
return "RETRIEVAL"
raw = resp.content.strip().upper()
query_type, use_editor_ctx = _parse_query_type(raw)
logger.info(f"[prepare/classify] selected={bool(selected_text)} raw='{raw}' -> {query_type} editor={use_editor_ctx}")
return {"query_type": query_type, "use_editor_context": use_editor_ctx}
def reformulate(state: AgentState) -> AgentState:
user_msg = state["messages"][-1]
resp = llm.invoke([REFORMULATE_PROMPT, user_msg])
selected_text = state.get("selected_text", "")
question = getattr(user_msg, "content",
user_msg.get("content", "")
if isinstance(user_msg, dict) else "")
anchor = _build_reformulate_query(question, selected_text)
if selected_text:
from langchain_core.messages import HumanMessage as HM
resp = llm.invoke([REFORMULATE_PROMPT, HM(content=anchor)])
else:
query_type = state.get("query_type", "RETRIEVAL")
mode_hint = HumanMessage(content=f"[MODE: {query_type}]\n{question}")
resp = llm.invoke([REFORMULATE_PROMPT, mode_hint])
reformulated = resp.content.strip()
logger.info(f"[prepare/reformulate] -> '{reformulated}'")
logger.info(f"[prepare/reformulate] selected={bool(selected_text)} -> '{reformulated}'")
return {"reformulated_query": reformulated}
def retrieve(state: AgentState) -> AgentState:
@ -366,7 +447,7 @@ def build_prepare_graph(llm, embeddings, es_client, index_name):
graph_builder.add_edge("reformulate", "retrieve")
graph_builder.add_edge("retrieve", END)
graph_builder.add_edge("skip_retrieve",END)
graph_builder.add_edge("skip_retrieve", END)
return graph_builder.compile()
@ -375,17 +456,29 @@ def build_final_messages(state: AgentState) -> list:
query_type = state.get("query_type", "RETRIEVAL")
context = state.get("context", "")
messages = state.get("messages", [])
editor_content = state.get("editor_content", "")
selected_text = state.get("selected_text", "")
extra_context = state.get("extra_context", "")
if query_type == "CONVERSATIONAL":
return [CONVERSATIONAL_PROMPT] + messages
use_editor = state.get("use_editor_context", False)
if query_type == "CODE_GENERATION":
prompt = SystemMessage(
content=CODE_GENERATION_PROMPT.content.format(context=context)
prompt = _build_generation_prompt(
template_prompt = CODE_GENERATION_PROMPT,
context = context,
editor_content = editor_content if use_editor else "",
selected_text = selected_text if use_editor else "",
extra_context = extra_context,
)
else:
prompt = SystemMessage(
content=GENERATE_PROMPT.content.format(context=context)
prompt = _build_generation_prompt(
template_prompt = GENERATE_PROMPT,
context = context,
editor_content = editor_content if use_editor else "",
selected_text= selected_text if use_editor else "",
extra_context = extra_context,
)
return [prompt] + messages
return [prompt] + messages

View File

@ -154,13 +154,42 @@ def _query_from_messages(messages: list[ChatMessage]) -> str:
return ""
async def _invoke_blocking(query: str, session_id: str) -> str:
async def _invoke_blocking(query: str, session_id: str, context = {}) -> str:
loop = asyncio.get_event_loop()
def _call():
stub = get_stub()
req = brunix_pb2.AgentRequest(query=query, session_id=session_id)
try:
ed_contxt = context["editor_content"] or ""
except Exception:
ed_contxt = ""
try:
sel_contxt = context["selected_text"] or ""
except Exception:
sel_contxt = ""
try:
ext_contxt = context["extra_context"] or ""
except Exception:
ext_contxt = ""
try:
us_info = str(context["user_info"]) or "{}"
except Exception:
us_info = "{}"
req = brunix_pb2.AgentRequest(query=query, session_id=session_id,
editor_content=ed_contxt,
selected_text=sel_contxt,
extra_context=ext_contxt,
user_info=us_info)
parts = []
for resp in stub.AskAgent(req):
if resp.text:
@ -170,7 +199,7 @@ async def _invoke_blocking(query: str, session_id: str) -> str:
return await loop.run_in_executor(_thread_pool, _call)
async def _iter_stream(query: str, session_id: str) -> AsyncIterator[brunix_pb2.AgentResponse]:
async def _iter_stream(query: str, session_id: str, context = {}) -> AsyncIterator[brunix_pb2.AgentResponse]:
loop = asyncio.get_event_loop()
queue: asyncio.Queue = asyncio.Queue()
@ -178,8 +207,38 @@ async def _iter_stream(query: str, session_id: str) -> AsyncIterator[brunix_pb2.
def _producer():
try:
stub = get_stub()
req = brunix_pb2.AgentRequest(query=query, session_id=session_id)
for resp in stub.AskAgentStream(req): # ← AskAgentStream
print("CONTEXT ====")
print(context)
print("======= ====")
try:
ed_contxt = context["editor_content"] or ""
except Exception:
ed_contxt = ""
try:
sel_contxt = context["selected_text"] or ""
except Exception:
sel_contxt = ""
try:
ext_contxt = context["extra_context"] or ""
except Exception:
ext_contxt = ""
try:
us_info = str(context["user_info"]) or "{}"
except Exception:
us_info = "{}"
req = brunix_pb2.AgentRequest(query=query, session_id=session_id,
editor_content=ed_contxt,
selected_text=sel_contxt,
extra_context=ext_contxt,
user_info=us_info)
for resp in stub.AskAgentStream(req): # AskAgentStream
asyncio.run_coroutine_threadsafe(queue.put(resp), loop).result()
except Exception as e:
asyncio.run_coroutine_threadsafe(queue.put(e), loop).result()
@ -197,16 +256,17 @@ async def _iter_stream(query: str, session_id: str) -> AsyncIterator[brunix_pb2.
yield item
async def _stream_chat(query: str, session_id: str, req_id: str) -> AsyncIterator[str]:
async def _stream_chat(query: str, session_id: str, req_id: str, context = {}) -> AsyncIterator[str]:
try:
async for resp in _iter_stream(query, session_id):
async for resp in _iter_stream(query, session_id, context):
if resp.is_final:
yield _sse(_chat_chunk("", req_id, finish="stop"))
break
if resp.text:
yield _sse(_chat_chunk(resp.text, req_id))
except Exception as e:
logger.error(f"[stream_chat] error: {e}")
logger.error(f"[stream_chat] error: {e}", exc_info=True)
yield _sse(_chat_chunk(f"[Error: {e}]", req_id, finish="stop"))
yield _sse_done()
@ -285,8 +345,17 @@ async def list_models():
@app.post("/v1/chat/completions")
async def chat_completions(req: ChatCompletionRequest):
query = _query_from_messages(req.messages)
session_id = req.session_id or req.user or "default"
session_id = req.session_id or "default"
req_id = f"chatcmpl-{uuid.uuid4().hex}"
context = {}
try:
context = json.loads(req.user)
except Exception as e:
pass
logger.info(f"[chat] session={session_id} stream={req.stream} query='{query[:80]}'")
@ -296,7 +365,7 @@ async def chat_completions(req: ChatCompletionRequest):
if req.stream:
return StreamingResponse(
_stream_chat(query, session_id, req_id),
_stream_chat(query, session_id, req_id, context),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)

View File

@ -4,7 +4,8 @@ from langchain_core.messages import SystemMessage
CLASSIFY_PROMPT_TEMPLATE = (
"<role>\n"
"You are a query classifier for an AVAP language assistant. "
"Your only job is to classify the user message into one of three categories.\n"
"Your only job is to classify the user message into one of three categories "
"and determine whether the user is explicitly asking about the editor code.\n"
"</role>\n\n"
"<categories>\n"
@ -28,9 +29,27 @@ CLASSIFY_PROMPT_TEMPLATE = (
"'describe it in your own words', 'what did you mean?'\n"
"</categories>\n\n"
"<editor_rule>\n"
"The second word of your response indicates whether the user is explicitly "
"asking about the code in their editor or selected text.\n"
"Answer EDITOR only if the user message clearly refers to specific code "
"they are looking at — using expressions like: "
"'this code', 'este codigo', 'esto', 'this function', 'fix this', "
"'explain this', 'what does this do', 'que hace esto', "
"'como mejoro esto', 'el codigo del editor', 'lo que tengo aqui', "
"'this selection', 'lo seleccionado', or similar.\n"
"Answer NO_EDITOR in all other cases — including general AVAP questions, "
"code generation requests, and conversational follow-ups that do not "
"refer to specific editor code.\n"
"</editor_rule>\n\n"
"<output_rule>\n"
"Your entire response must be exactly one word: "
"RETRIEVAL, CODE_GENERATION, or CONVERSATIONAL. Nothing else.\n"
"Your entire response must be exactly two words separated by a single space.\n"
"First word: RETRIEVAL, CODE_GENERATION, or CONVERSATIONAL.\n"
"Second word: EDITOR or NO_EDITOR.\n"
"Valid examples: 'RETRIEVAL NO_EDITOR', 'CODE_GENERATION EDITOR', "
"'CONVERSATIONAL NO_EDITOR'.\n"
"No other output. No punctuation. No explanation.\n"
"</output_rule>\n\n"
"<conversation_history>\n"
@ -49,10 +68,23 @@ REFORMULATE_PROMPT = SystemMessage(
"into keyword queries that will find the right AVAP documentation chunks.\n"
"</role>\n\n"
"<mode_rule>\n"
"The input starts with [MODE: X]. Follow these rules strictly:\n"
"- MODE RETRIEVAL: rewrite as compact keywords. DO NOT expand with AVAP commands. "
"DO NOT translate — preserve the original language.\n"
"- MODE CODE_GENERATION: apply the command expansion mapping in <task>.\n"
"- MODE CONVERSATIONAL: return the question as-is.\n"
"</mode_rule>\n\n"
"<language_rule>\n"
"NEVER translate the query. If the user writes in Spanish, rewrite in Spanish. "
"If the user writes in English, rewrite in English.\n"
"</language_rule>\n\n"
"<task>\n"
"Rewrite the user message into a compact keyword query for semantic search.\n\n"
"SPECIAL RULE for code generation requests:\n"
"SPECIAL RULE for CODE_GENERATION only:\n"
"When the user asks to generate/create/build/show AVAP code, expand the query "
"with the AVAP commands typically needed. Use this mapping:\n\n"
@ -80,21 +112,27 @@ REFORMULATE_PROMPT = SystemMessage(
"- Remove filler words.\n"
"- Output a single line.\n"
"- Never answer the question.\n"
"- Never translate.\n"
"</rules>\n\n"
"<examples>\n"
"<example>\n"
"<input>What does AVAP stand for?</input>\n"
"<o>AVAP stand for</o>\n"
"<input>[MODE: RETRIEVAL] Que significa AVAP?</input>\n"
"<o>AVAP significado definición lenguaje DSL</o>\n"
"</example>\n\n"
"<example>\n"
"<input>dime como seria un API que devuelva hello world con AVAP</input>\n"
"<input>[MODE: RETRIEVAL] What does AVAP stand for?</input>\n"
"<o>AVAP definition language stands for</o>\n"
"</example>\n\n"
"<example>\n"
"<input>[MODE: CODE_GENERATION] dime como seria un API que devuelva hello world con AVAP</input>\n"
"<o>AVAP registerEndpoint addResult _status hello world example</o>\n"
"</example>\n\n"
"<example>\n"
"<input>generate an AVAP script that reads a parameter and queries the DB</input>\n"
"<input>[MODE: CODE_GENERATION] generate an AVAP script that reads a parameter and queries the DB</input>\n"
"<o>AVAP addParam ormAccessSelect avapConnector registerEndpoint addResult</o>\n"
"</example>\n"
"</examples>\n\n"
@ -232,6 +270,7 @@ GENERATE_PROMPT = SystemMessage(
"</thinking_steps>\n\n"
"<output_format>\n"
"Answer in the same language the user used.\n\n"
"Answer:\n"
"<direct answer; include code blocks if context has relevant code>\n\n"
@ -247,4 +286,4 @@ GENERATE_PROMPT = SystemMessage(
"{context}\n"
"</context>"
)
)
)

View File

@ -1,3 +1,4 @@
import base64
import logging
import os
from concurrent import futures
@ -79,7 +80,33 @@ class BrunixEngine(brunix_pb2_grpc.AssistanceEngineServicer):
def AskAgent(self, request, context):
session_id = request.session_id or "default"
query = request.query
logger.info(f"[AskAgent] session={session_id} query='{query[:80]}'")
try:
editor_content = base64.b64decode(request.editor_content).decode("utf-8") if request.editor_content else ""
except Exception:
editor_content = ""
logger.warning("[AskAgent] editor_content base64 decode failed")
try:
selected_text = base64.b64decode(request.selected_text).decode("utf-8") if request.selected_text else ""
except Exception:
selected_text = ""
logger.warning("[AskAgent] selected_text base64 decode failed")
try:
extra_context = base64.b64decode(request.extra_context).decode("utf-8") if request.extra_context else ""
except Exception:
extra_context = ""
logger.warning("[AskAgent] extra_context base64 decode failed")
user_info = request.user_info or "{}"
logger.info(
f"[AskAgent] session={session_id} "
f"editor={bool(editor_content)} selected={bool(selected_text)} "
f"query='{query[:80]}'"
)
try:
history = list(session_store.get(session_id, []))
@ -91,6 +118,11 @@ class BrunixEngine(brunix_pb2_grpc.AssistanceEngineServicer):
"reformulated_query": "",
"context": "",
"query_type": "",
"editor_content": editor_content,
"selected_text": selected_text,
"extra_context": extra_context,
"user_info": user_info
}
final_state = self.graph.invoke(initial_state)
@ -119,7 +151,33 @@ class BrunixEngine(brunix_pb2_grpc.AssistanceEngineServicer):
def AskAgentStream(self, request, context):
session_id = request.session_id or "default"
query = request.query
logger.info(f"[AskAgentStream] session={session_id} query='{query[:80]}'")
try:
editor_content = base64.b64decode(request.editor_content).decode("utf-8") if request.editor_content else ""
except Exception:
editor_content = ""
logger.warning("[AskAgent] editor_content base64 decode failed")
try:
selected_text = base64.b64decode(request.selected_text).decode("utf-8") if request.selected_text else ""
except Exception:
selected_text = ""
logger.warning("[AskAgent] selected_text base64 decode failed")
try:
extra_context = base64.b64decode(request.extra_context).decode("utf-8") if request.extra_context else ""
except Exception:
extra_context = ""
logger.warning("[AskAgent] extra_context base64 decode failed")
user_info = request.user_info or "{}"
logger.info(
f"[AskAgentStream] session={session_id} "
f"editor={bool(editor_content)} selected={bool(selected_text)} "
f"query='{query[:80]}'"
)
try:
history = list(session_store.get(session_id, []))
@ -131,6 +189,11 @@ class BrunixEngine(brunix_pb2_grpc.AssistanceEngineServicer):
"reformulated_query": "",
"context": "",
"query_type": "",
"editor_content": editor_content,
"selected_text": selected_text,
"extra_context": extra_context,
"user_info": user_info
}
prepared = self.prepare_graph.invoke(initial_state)

View File

@ -4,8 +4,15 @@ from langgraph.graph.message import add_messages
class AgentState(TypedDict):
# -- CORE
messages: Annotated[list, add_messages]
reformulated_query: str
context: str
query_type: str
session_id: str
session_id: str
# -- OPEN AI API
editor_content: str
selected_text: str
extra_context: str
user_info: str
use_editor_context: bool

View File

@ -14,6 +14,11 @@ class OpenAIChatFactory(BaseProviderFactory):
return ChatOpenAI(model=model, **kwargs)
class AnthropicChatFactory(BaseProviderFactory):
def create(self, model: str, **kwargs: Any):
from langchain_anthropic import ChatAnthropic
return ChatAnthropic(model=model, **kwargs)
class OllamaChatFactory(BaseProviderFactory):
def create(self, model: str, **kwargs: Any):
@ -46,6 +51,7 @@ CHAT_FACTORIES: Dict[str, BaseProviderFactory] = {
"ollama": OllamaChatFactory(),
"bedrock": BedrockChatFactory(),
"huggingface": HuggingFaceChatFactory(),
"anthropic": AnthropicChatFactory(),
}

0
Docker/tests/__init__.py Normal file
View File

View File

@ -0,0 +1,396 @@
"""
tests/test_prd_0002.py
Unit tests for PRD-0002 Editor Context Injection.
These tests run without any external dependencies (no Elasticsearch, no Ollama,
no gRPC server). They validate the logic of the components modified in PRD-0002:
- _parse_query_type classifier output parser (graph.py)
- _parse_editor_context user field parser (openai_proxy.py)
- _build_classify_prompt classify prompt builder (graph.py)
- _build_reformulate_query reformulate anchor builder (graph.py)
- _build_generation_prompt generation prompt builder (graph.py)
- _decode_b64 base64 decoder (server.py)
Run with:
pytest tests/test_prd_0002.py -v
"""
import base64
import json
import sys
import os
import pytest
# ---------------------------------------------------------------------------
# Minimal stubs so we can import graph.py and openai_proxy.py without
# the full Docker/src environment loaded
# ---------------------------------------------------------------------------
# Stub brunix_pb2 so openai_proxy imports cleanly
import types
brunix_pb2 = types.ModuleType("brunix_pb2")
brunix_pb2.AgentRequest = lambda **kw: kw
brunix_pb2.AgentResponse = lambda **kw: kw
sys.modules["brunix_pb2"] = brunix_pb2
sys.modules["brunix_pb2_grpc"] = types.ModuleType("brunix_pb2_grpc")
# Stub grpc
grpc_mod = types.ModuleType("grpc")
grpc_mod.insecure_channel = lambda *a, **kw: None
grpc_mod.Channel = object
grpc_mod.RpcError = Exception
sys.modules["grpc"] = grpc_mod
# Stub grpc_reflection
refl = types.ModuleType("grpc_reflection.v1alpha.reflection")
sys.modules["grpc_reflection"] = types.ModuleType("grpc_reflection")
sys.modules["grpc_reflection.v1alpha"] = types.ModuleType("grpc_reflection.v1alpha")
sys.modules["grpc_reflection.v1alpha.reflection"] = refl
# Add Docker/src to path so we can import the modules directly
DOCKER_SRC = os.path.join(os.path.dirname(__file__), "..", "Docker", "src")
sys.path.insert(0, os.path.abspath(DOCKER_SRC))
# ---------------------------------------------------------------------------
# Import the functions under test
# ---------------------------------------------------------------------------
# We import only the pure functions — no LLM, no ES, no gRPC calls
def _parse_query_type(raw: str):
"""Copy of _parse_query_type from graph.py — tested in isolation."""
parts = raw.strip().upper().split()
query_type = "RETRIEVAL"
use_editor = False
if parts:
first = parts[0]
if first.startswith("CODE_GENERATION") or "CODE" in first:
query_type = "CODE_GENERATION"
elif first.startswith("CONVERSATIONAL"):
query_type = "CONVERSATIONAL"
if len(parts) > 1 and parts[1] == "EDITOR":
use_editor = True
return query_type, use_editor
def _decode_b64(value: str) -> str:
"""Copy of _decode_b64 from server.py — tested in isolation."""
try:
return base64.b64decode(value).decode("utf-8") if value else ""
except Exception:
return ""
def _parse_editor_context(user):
"""Copy of _parse_editor_context from openai_proxy.py — tested in isolation."""
if not user:
return "", "", "", ""
try:
ctx = json.loads(user)
if isinstance(ctx, dict):
return (
ctx.get("editor_content", "") or "",
ctx.get("selected_text", "") or "",
ctx.get("extra_context", "") or "",
json.dumps(ctx.get("user_info", {})),
)
except (json.JSONDecodeError, TypeError):
pass
return "", "", "", ""
def _build_reformulate_query(question: str, selected_text: str) -> str:
"""Copy of _build_reformulate_query from graph.py — tested in isolation."""
if not selected_text:
return question
return f"{selected_text}\n\nUser question about the above: {question}"
def _build_generation_prompt_injects(editor_content, selected_text, use_editor):
"""Helper — returns True if editor context would be injected."""
sections = []
if selected_text and use_editor:
sections.append("selected_code")
if editor_content and use_editor:
sections.append("editor_file")
return len(sections) > 0
# ---------------------------------------------------------------------------
# Tests: _parse_query_type
# ---------------------------------------------------------------------------
class TestParseQueryType:
def test_retrieval_no_editor(self):
qt, ue = _parse_query_type("RETRIEVAL NO_EDITOR")
assert qt == "RETRIEVAL"
assert ue is False
def test_retrieval_editor(self):
qt, ue = _parse_query_type("RETRIEVAL EDITOR")
assert qt == "RETRIEVAL"
assert ue is True
def test_code_generation_no_editor(self):
qt, ue = _parse_query_type("CODE_GENERATION NO_EDITOR")
assert qt == "CODE_GENERATION"
assert ue is False
def test_code_generation_editor(self):
qt, ue = _parse_query_type("CODE_GENERATION EDITOR")
assert qt == "CODE_GENERATION"
assert ue is True
def test_conversational_no_editor(self):
qt, ue = _parse_query_type("CONVERSATIONAL NO_EDITOR")
assert qt == "CONVERSATIONAL"
assert ue is False
def test_single_token_defaults_no_editor(self):
"""If model returns only one token, use_editor defaults to False."""
qt, ue = _parse_query_type("RETRIEVAL")
assert qt == "RETRIEVAL"
assert ue is False
def test_empty_defaults_retrieval_no_editor(self):
qt, ue = _parse_query_type("")
assert qt == "RETRIEVAL"
assert ue is False
def test_case_insensitive(self):
qt, ue = _parse_query_type("retrieval editor")
assert qt == "RETRIEVAL"
assert ue is True
def test_code_shorthand(self):
"""'CODE' alone should map to CODE_GENERATION."""
qt, ue = _parse_query_type("CODE NO_EDITOR")
assert qt == "CODE_GENERATION"
assert ue is False
def test_extra_whitespace(self):
qt, ue = _parse_query_type(" RETRIEVAL NO_EDITOR ")
assert qt == "RETRIEVAL"
assert ue is False
# ---------------------------------------------------------------------------
# Tests: _decode_b64
# ---------------------------------------------------------------------------
class TestDecodeB64:
def test_valid_base64_spanish(self):
text = "addVar(mensaje, \"Hola mundo\")\naddResult(mensaje)"
encoded = base64.b64encode(text.encode("utf-8")).decode("utf-8")
assert _decode_b64(encoded) == text
def test_valid_base64_english(self):
text = "registerEndpoint(\"GET\", \"/hello\", [], \"public\", handler, \"\")"
encoded = base64.b64encode(text.encode("utf-8")).decode("utf-8")
assert _decode_b64(encoded) == text
def test_empty_string_returns_empty(self):
assert _decode_b64("") == ""
def test_none_equivalent_empty(self):
assert _decode_b64(None) == ""
def test_invalid_base64_returns_empty(self):
assert _decode_b64("not_valid_base64!!!") == ""
def test_unicode_content(self):
text = "// función de validación\nif(token, \"SECRET\", \"=\")"
encoded = base64.b64encode(text.encode("utf-8")).decode("utf-8")
assert _decode_b64(encoded) == text
# ---------------------------------------------------------------------------
# Tests: _parse_editor_context
# ---------------------------------------------------------------------------
class TestParseEditorContext:
def _encode(self, text: str) -> str:
return base64.b64encode(text.encode()).decode()
def test_full_context_parsed(self):
editor = self._encode("addVar(x, 10)")
selected = self._encode("addResult(x)")
extra = self._encode("/path/to/file.avap")
user_json = json.dumps({
"editor_content": editor,
"selected_text": selected,
"extra_context": extra,
"user_info": {"dev_id": 1, "project_id": 2, "org_id": 3}
})
ec, st, ex, ui = _parse_editor_context(user_json)
assert ec == editor
assert st == selected
assert ex == extra
assert json.loads(ui) == {"dev_id": 1, "project_id": 2, "org_id": 3}
def test_empty_user_returns_empty_tuple(self):
ec, st, ex, ui = _parse_editor_context(None)
assert ec == st == ex == ""
def test_empty_string_returns_empty_tuple(self):
ec, st, ex, ui = _parse_editor_context("")
assert ec == st == ex == ""
def test_plain_string_not_json_returns_empty(self):
"""Non-JSON user field — backward compat, no error raised."""
ec, st, ex, ui = _parse_editor_context("plain string")
assert ec == st == ex == ""
def test_missing_fields_default_empty(self):
user_json = json.dumps({"editor_content": "abc"})
ec, st, ex, ui = _parse_editor_context(user_json)
assert ec == "abc"
assert st == ""
assert ex == ""
def test_user_info_missing_defaults_empty_object(self):
user_json = json.dumps({"editor_content": "abc"})
_, _, _, ui = _parse_editor_context(user_json)
assert json.loads(ui) == {}
def test_user_info_full_object(self):
user_json = json.dumps({
"editor_content": "",
"selected_text": "",
"extra_context": "",
"user_info": {"dev_id": 42, "project_id": 7, "org_id": 99}
})
_, _, _, ui = _parse_editor_context(user_json)
parsed = json.loads(ui)
assert parsed["dev_id"] == 42
assert parsed["project_id"] == 7
assert parsed["org_id"] == 99
def test_session_id_not_leaked_into_context(self):
"""session_id must NOT appear in editor context — it has its own field."""
user_json = json.dumps({
"editor_content": "",
"selected_text": "",
"extra_context": "",
"user_info": {}
})
ec, st, ex, ui = _parse_editor_context(user_json)
assert "session_id" not in ec
assert "session_id" not in st
# ---------------------------------------------------------------------------
# Tests: _build_reformulate_query
# ---------------------------------------------------------------------------
class TestBuildReformulateQuery:
def test_no_selected_text_returns_question(self):
q = "Que significa AVAP?"
assert _build_reformulate_query(q, "") == q
def test_selected_text_prepended_to_question(self):
q = "que hace esto?"
selected = "addVar(x, 10)\naddResult(x)"
result = _build_reformulate_query(q, selected)
assert result.startswith(selected)
assert q in result
def test_selected_text_anchor_format(self):
q = "fix this"
selected = "try()\n ormDirect(query, res)\nexception(e)\nend()"
result = _build_reformulate_query(q, selected)
assert "User question about the above:" in result
assert selected in result
assert q in result
# ---------------------------------------------------------------------------
# Tests: editor context injection logic
# ---------------------------------------------------------------------------
class TestEditorContextInjection:
def test_no_injection_when_use_editor_false(self):
"""Editor content must NOT be injected when use_editor_context is False."""
injected = _build_generation_prompt_injects(
editor_content = "addVar(x, 10)",
selected_text = "addResult(x)",
use_editor = False,
)
assert injected is False
def test_injection_when_use_editor_true_and_content_present(self):
"""Editor content MUST be injected when use_editor_context is True."""
injected = _build_generation_prompt_injects(
editor_content = "addVar(x, 10)",
selected_text = "addResult(x)",
use_editor = True,
)
assert injected is True
def test_no_injection_when_content_empty_even_if_flag_true(self):
"""Empty fields must never be injected even if flag is True."""
injected = _build_generation_prompt_injects(
editor_content = "",
selected_text = "",
use_editor = True,
)
assert injected is False
def test_partial_injection_selected_only(self):
"""selected_text alone triggers injection when flag is True."""
injected = _build_generation_prompt_injects(
editor_content = "",
selected_text = "addResult(x)",
use_editor = True,
)
assert injected is True
# ---------------------------------------------------------------------------
# Tests: classifier routing — EDITOR signal
# ---------------------------------------------------------------------------
class TestClassifierEditorSignal:
"""
These tests validate that the two-token output format is correctly parsed
for all combinations the classifier can produce.
"""
VALID_OUTPUTS = [
("RETRIEVAL NO_EDITOR", "RETRIEVAL", False),
("RETRIEVAL EDITOR", "RETRIEVAL", True),
("CODE_GENERATION NO_EDITOR", "CODE_GENERATION", False),
("CODE_GENERATION EDITOR", "CODE_GENERATION", True),
("CONVERSATIONAL NO_EDITOR", "CONVERSATIONAL", False),
("CONVERSATIONAL EDITOR", "CONVERSATIONAL", True),
]
@pytest.mark.parametrize("raw,expected_qt,expected_ue", VALID_OUTPUTS)
def test_valid_two_token_output(self, raw, expected_qt, expected_ue):
qt, ue = _parse_query_type(raw)
assert qt == expected_qt
assert ue == expected_ue
def test_editor_flag_false_for_general_avap_question(self):
"""'Que significa AVAP?' -> RETRIEVAL NO_EDITOR."""
qt, ue = _parse_query_type("RETRIEVAL NO_EDITOR")
assert ue is False
def test_editor_flag_true_for_explicit_editor_reference(self):
"""'que hace este codigo?' with selected_text -> RETRIEVAL EDITOR."""
qt, ue = _parse_query_type("RETRIEVAL EDITOR")
assert ue is True
def test_editor_flag_false_for_code_generation_without_reference(self):
"""'dame un API de hello world' -> CODE_GENERATION NO_EDITOR."""
qt, ue = _parse_query_type("CODE_GENERATION NO_EDITOR")
assert ue is False

47
Makefile Normal file
View File

@ -0,0 +1,47 @@
.PHONY: help requirements docker-build docker-up docker-down clean start tunnels_up
help:
@echo "Available commands:"
@echo " make sync_requirements - Export dependencies from pyproject.toml to requirements.txt"
@echo " make tunnels_up - Start tunnels"
@echo " make compose_up - Run tunnels script and start Docker Compose"
.PHONY: sync_requirements
sync_requirements:
@echo "Exporting dependencies from pyproject.toml to requirements.txt..."
uv export --format requirements-txt --no-hashes --no-dev -o Docker/requirements.txt
@echo "✓ requirements.txt updated successfully"
.PHONY: tunnels_up
tunnels_up:
bash ./scripts/start-tunnels.sh < /dev/null &
@echo "✓ Tunnels started!"
.PHONY: compose_up
compose_up:
bash ./scripts/start-tunnels.sh < /dev/null &
sleep 2
docker compose -f Docker/docker-compose.yaml --env-file .env up -d --build
@echo "✓ Done!"
# Kill all kubectl port-forward tunnels
.PHONY: tunnels_down
tunnels_down:
@echo "Killing all kubectl port-forward tunnels..."
-pkill -f 'kubectl port-forward' || true
@echo "✓ All tunnels killed!"
.PHONY: sync_data_down
sync_data_down:
aws s3 sync s3://mrh-avap/data/ \
data/
## Upload Data to storage system
.PHONY: sync_data_up
sync_data_up:
aws s3 sync --exclude "*.gitkeep" data/ \
s3://mrh-avap/data
.PHONY: ollama_local
ollama_local:
ssh -i ~/.ssh/mrh-transformers.pem -L 11434:localhost:11434 ubuntu@172.18.14.34

448
NOTICE Normal file
View File

@ -0,0 +1,448 @@
NOTICE
======
Brunix Assistance Engine
Copyright (c) 2026 101OBEX Corp. All rights reserved.
This product includes software developed by third parties under open source
licenses. The following is a list of the open source components used in this
product, along with their respective licenses and copyright notices.
-------------------------------------------------------------------------------
RUNTIME DEPENDENCIES (Docker/requirements.txt)
-----------------------------------------------
aiohttp (3.13.3)
License: Apache 2.0
Copyright: aio-libs contributors
https://github.com/aio-libs/aiohttp
annotated-types (0.7.0)
License: MIT
Copyright: Adrian Garcia Badaracco, Samuel Colvin, Zac Hatfield-Dodds
https://github.com/annotated-types/annotated-types
anyio (4.12.1)
License: MIT
Copyright: Alex Grönholm
https://github.com/agronholm/anyio
attrs (25.4.0)
License: MIT
Copyright: Hynek Schlawack
https://github.com/python-attrs/attrs
boto3 (1.42.58)
License: Apache 2.0
Copyright: Amazon Web Services
https://github.com/boto/boto3
botocore (1.42.58)
License: Apache 2.0
Copyright: Amazon Web Services
https://github.com/boto/botocore
certifi
License: MPL 2.0
Copyright: Kenneth Reitz
https://github.com/certifi/python-certifi
charset-normalizer (3.4.4)
License: MIT
Copyright: Ahmed TAHRI
https://github.com/Ousret/charset_normalizer
chonkie (1.5.6)
License: MIT
Copyright: Bhavnick Minhas
https://github.com/chonkie-ai/chonkie
click (8.3.1)
License: BSD 3-Clause
Copyright: Armin Ronacher
https://github.com/pallets/click
dataclasses-json (0.6.7)
License: MIT
Copyright: Lídia Contreras, Radek Nohejl
https://github.com/lidatong/dataclasses-json
elastic-transport (8.17.1)
License: Apache 2.0
Copyright: Elasticsearch B.V.
https://github.com/elastic/elastic-transport-python
elasticsearch (8.19.3)
License: Apache 2.0
Copyright: Elasticsearch B.V.
https://github.com/elastic/elasticsearch-py
fastapi (0.111+)
License: MIT
Copyright: Sebastián Ramírez
https://github.com/fastapi/fastapi
filelock (3.24.3)
License: Unlicense / Public Domain
https://github.com/tox-dev/filelock
grpcio (1.78.1)
License: Apache 2.0
Copyright: The gRPC Authors
https://github.com/grpc/grpc
grpcio-reflection (1.78.1)
License: Apache 2.0
Copyright: The gRPC Authors
https://github.com/grpc/grpc
grpcio-tools (1.78.1)
License: Apache 2.0
Copyright: The gRPC Authors
https://github.com/grpc/grpc
httpcore (1.0.9)
License: BSD 3-Clause
Copyright: Tom Christie
https://github.com/encode/httpcore
httpx (0.28.1)
License: BSD 3-Clause
Copyright: Tom Christie
https://github.com/encode/httpx
huggingface-hub (0.36.2)
License: Apache 2.0
Copyright: HuggingFace Inc.
https://github.com/huggingface/huggingface_hub
jinja2 (3.1.6)
License: BSD 3-Clause
Copyright: Armin Ronacher
https://github.com/pallets/jinja
joblib (1.5.3)
License: BSD 3-Clause
Copyright: Gael Varoquaux
https://github.com/joblib/joblib
jsonpatch (1.33)
License: BSD 3-Clause
Copyright: Stefan Kögl
https://github.com/stefankoegl/python-json-patch
langchain (1.2.10)
License: MIT
Copyright: LangChain, Inc.
https://github.com/langchain-ai/langchain
langchain-anthropic
License: MIT
Copyright: LangChain, Inc.
https://github.com/langchain-ai/langchain
langchain-aws (1.3.1)
License: MIT
Copyright: LangChain, Inc.
https://github.com/langchain-ai/langchain
langchain-community (0.4.1)
License: MIT
Copyright: LangChain, Inc.
https://github.com/langchain-ai/langchain
langchain-core (1.2.15)
License: MIT
Copyright: LangChain, Inc.
https://github.com/langchain-ai/langchain
langchain-elasticsearch (1.0.0)
License: MIT
Copyright: LangChain, Inc.
https://github.com/langchain-ai/langchain
langchain-huggingface (1.2.0)
License: MIT
Copyright: LangChain, Inc.
https://github.com/langchain-ai/langchain
langchain-ollama (1.0.1)
License: MIT
Copyright: LangChain, Inc.
https://github.com/langchain-ai/langchain
langgraph (1.0.9)
License: MIT
Copyright: LangChain, Inc.
https://github.com/langchain-ai/langgraph
langsmith (0.7.6)
License: MIT
Copyright: LangChain, Inc.
https://github.com/langchain-ai/langsmith
loguru (0.7.3)
License: MIT
Copyright: Delgan
https://github.com/Delgan/loguru
model2vec (0.7.0)
License: MIT
Copyright: MinishLab
https://github.com/MinishLab/model2vec
nltk (3.9.3)
License: Apache 2.0
Copyright: NLTK Project
https://github.com/nltk/nltk
numpy (2.4.2)
License: BSD 3-Clause
Copyright: NumPy Developers
https://github.com/numpy/numpy
ollama (0.6.1)
License: MIT
Copyright: Ollama
https://github.com/ollama/ollama-python
orjson (3.11.7)
License: Apache 2.0 / MIT
Copyright: ijl
https://github.com/ijl/orjson
packaging (24.2)
License: Apache 2.0 / BSD 2-Clause
Copyright: PyPA
https://github.com/pypa/packaging
pandas (3.0.1)
License: BSD 3-Clause
Copyright: The Pandas Development Team
https://github.com/pandas-dev/pandas
protobuf (6.33.5)
License: BSD 3-Clause
Copyright: Google LLC
https://github.com/protocolbuffers/protobuf
pydantic (2.12.5)
License: MIT
Copyright: Samuel Colvin
https://github.com/pydantic/pydantic
pydantic-settings (2.13.1)
License: MIT
Copyright: Samuel Colvin
https://github.com/pydantic/pydantic-settings
pygments (2.19.2)
License: BSD 2-Clause
Copyright: Georg Brandl
https://github.com/pygments/pygments
python-dateutil (2.9.0)
License: Apache 2.0 / BSD 3-Clause
Copyright: Gustavo Niemeyer
https://github.com/dateutil/dateutil
python-dotenv (1.2.1)
License: BSD 3-Clause
Copyright: Saurabh Kumar
https://github.com/theskumar/python-dotenv
pyyaml (6.0.3)
License: MIT
Copyright: Kirill Simonov
https://github.com/yaml/pyyaml
ragas (0.4.3+)
License: Apache 2.0
Copyright: Exploding Gradients
https://github.com/explodinggradients/ragas
rapidfuzz (3.14.3)
License: MIT
Copyright: Max Bachmann
https://github.com/rapidfuzz/RapidFuzz
regex (2026.2.19)
License: Apache 2.0
Copyright: Matthew Barnett
https://github.com/mrabarnett/mrab-regex
requests (2.32.5)
License: Apache 2.0
Copyright: Kenneth Reitz
https://github.com/psf/requests
rich (14.3.3)
License: MIT
Copyright: Will McGugan
https://github.com/Textualize/rich
s3transfer (0.16.0)
License: Apache 2.0
Copyright: Amazon Web Services
https://github.com/boto/s3transfer
safetensors (0.7.0)
License: Apache 2.0
Copyright: HuggingFace Inc.
https://github.com/huggingface/safetensors
setuptools (82.0.0)
License: MIT
Copyright: Jason R. Coombs
https://github.com/pypa/setuptools
six (1.17.0)
License: MIT
Copyright: Benjamin Peterson
https://github.com/benjaminp/six
sqlalchemy (2.0.46)
License: MIT
Copyright: SQLAlchemy authors
https://github.com/sqlalchemy/sqlalchemy
tenacity (9.1.4)
License: Apache 2.0
Copyright: Julien Danjou
https://github.com/jd/tenacity
tokenizers (0.22.2)
License: Apache 2.0
Copyright: HuggingFace Inc.
https://github.com/huggingface/tokenizers
tqdm (4.67.3)
License: MIT / MPL 2.0
Copyright: Casper da Costa-Luis
https://github.com/tqdm/tqdm
typing-extensions (4.15.0)
License: PSF 2.0
Copyright: Python Software Foundation
https://github.com/python/typing_extensions
urllib3 (2.6.3)
License: MIT
Copyright: Andrey Petrov
https://github.com/urllib3/urllib3
uvicorn (0.29+)
License: BSD 3-Clause
Copyright: Tom Christie
https://github.com/encode/uvicorn
xxhash (3.6.0)
License: BSD 2-Clause
Copyright: Yue Du
https://github.com/ifduyue/python-xxhash
yarl (1.22.0)
License: Apache 2.0
Copyright: aio-libs contributors
https://github.com/aio-libs/yarl
zstandard (0.25.0)
License: BSD 3-Clause
Copyright: Gregory Szorc
https://github.com/indygreg/python-zstandard
-------------------------------------------------------------------------------
DEVELOPMENT DEPENDENCIES (pyproject.toml — dev group)
------------------------------------------------------
These dependencies are used only during development and research.
They are not included in the production Docker image.
beir (2.2.0+)
License: Apache 2.0
Copyright: Nandan Thakur
https://github.com/beir-cellar/beir
datasets
License: Apache 2.0
Copyright: HuggingFace Inc.
https://github.com/huggingface/datasets
jupyter
License: BSD 3-Clause
Copyright: Project Jupyter Contributors
https://github.com/jupyter/jupyter
langfuse (<3)
License: MIT
Copyright: Langfuse GmbH
https://github.com/langfuse/langfuse
litellm (1.82.0+)
License: MIT
Copyright: BerriAI
https://github.com/BerriAI/litellm
mteb (2.8.8+)
License: Apache 2.0
Copyright: MTEB Authors
https://github.com/embeddings-benchmark/mteb
polars (1.38.1+)
License: MIT
Copyright: Ritchie Vink
https://github.com/pola-rs/polars
ruff (0.15.1+)
License: MIT
Copyright: Astral Software
https://github.com/astral-sh/ruff
tree-sitter-language-pack (0.13.0+)
License: MIT
Copyright: Various
https://github.com/Goldziher/tree-sitter-language-pack
-------------------------------------------------------------------------------
EXTERNAL SERVICES (not bundled — accessed at runtime via API or network)
-------------------------------------------------------------------------
Ollama
License: MIT
Copyright: Ollama, Inc.
https://github.com/ollama/ollama
Note: Used as local LLM and embedding inference server.
Not bundled in this repository.
Elasticsearch (8.x)
License: SSPL / Elastic License 2.0
Copyright: Elasticsearch B.V.
https://github.com/elastic/elasticsearch
Note: Used as vector database and full-text search engine.
Not bundled in this repository. Deployed separately on Devaron Cluster.
Anthropic Claude API
Copyright: Anthropic, PBC.
https://www.anthropic.com
Note: Used as evaluation judge in the EvaluateRAG pipeline.
Accessed via API key. Not bundled in this repository.
Langfuse
License: MIT (self-hosted)
Copyright: Langfuse GmbH
https://github.com/langfuse/langfuse
Note: Used for LLM observability and tracing.
Deployed separately on Devaron Cluster.
-------------------------------------------------------------------------------
DISCLAIMER
The licenses listed above are provided for informational purposes only.
101OBEX Corp makes no representations or warranties regarding the accuracy
of this list. Users of this software are responsible for ensuring compliance
with the applicable license terms of all third-party components.
For questions regarding licensing, contact: https://www.101obex.com

View File

@ -63,6 +63,8 @@ graph TD
│ │ └── utils/
│ │ ├── emb_factory.py # Provider-agnostic embedding model factory
│ │ └── llm_factory.py # Provider-agnostic LLM factory
│ ├── tests/
│ │ └── test_prd_0002.py # Unit tests — editor context, classifier, proxy parsing
│ ├── Dockerfile # Multi-stage container build
│ ├── docker-compose.yaml # Local dev orchestration
│ ├── entrypoint.sh # Starts gRPC server + HTTP proxy in parallel
@ -75,17 +77,26 @@ graph TD
│ ├── API_REFERENCE.md # Complete gRPC & HTTP API contract with examples
│ ├── RUNBOOK.md # Operational playbooks and incident response
│ ├── AVAP_CHUNKER_CONFIG.md # avap_config.json reference — blocks, statements, semantic tags
│ ├── adr/ # Architecture Decision Records
│ ├── ADR/ # Architecture Decision Records
│ │ ├── ADR-0001-grpc-primary-interface.md
│ │ ├── ADR-0002-two-phase-streaming.md
│ │ ├── ADR-0003-hybrid-retrieval-rrf.md
│ │ └── ADR-0004-claude-eval-judge.md
│ │ ├── ADR-0004-claude-eval-judge.md
│ │ └── ADR-0005-embedding-model-selection.md
│ └── product/ # Product Requirements Documents
│ ├── PRD-0001-openai-compatible-proxy.md
│ └── PRD-0002-editor-context-injection.md
│ ├── avap_language_github_docs/ # AVAP language reference docs (GitHub source)
│ ├── developer.avapframework.com/ # AVAP developer portal docs
│ ├── LRM/
│ │ └── avap.md # AVAP Language Reference Manual (LRM)
│ └── samples/ # AVAP code samples (.avap) used for ingestion
├── LICENSE # Proprietary license — 101OBEX Corp, Delaware
├── research/ # Experiment results, benchmarks, datasets (MrHouston)
│ └── embeddings/ # Embedding model benchmark results (BEIR)
├── ingestion/
│ └── chunks.json # Last export of ingested chunks (ES bulk output)
@ -109,6 +120,7 @@ graph TD
│ └── ingestion/
│ └── chunks.jsonl # JSONL output from avap_chunker.py
├── research/ # Directory containing all research done alongside its documents, results and notebooks
└── src/ # Shared library (used by both Docker and scripts)
├── config.py # Pydantic settings — reads all environment variables
└── utils/
@ -396,7 +408,7 @@ Returns the full answer as a single message with `is_final: true`. Suitable for
```bash
grpcurl -plaintext \
-d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
-d '{"query": "Que significa AVAP?", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgent
```
@ -404,7 +416,7 @@ grpcurl -plaintext \
Expected response:
```json
{
"text": "addVar is an AVAP command used to declare a variable...",
"text": "AVAP (Advanced Virtual API Programming) es un DSL Turing Completo...",
"avap_code": "AVAP-2026",
"is_final": true
}
@ -493,17 +505,33 @@ This enables integration with any tool that supports the OpenAI or Ollama API (c
| `POST` | `/v1/completions` | Legacy text completion — streaming and non-streaming |
| `GET` | `/health` | Health check — returns gRPC target and status |
**Non-streaming chat:**
**Non-streaming chat — general query:**
```bash
curl http://localhost:8000/v1/chat/completions \
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "What is AVAP?"}],
"stream": false
"messages": [{"role": "user", "content": "Que significa AVAP?"}],
"stream": false,
"session_id": "dev-001"
}'
```
**Non-streaming chat — with editor context (VS Code extension):**
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "que hace este codigo?"}],
"stream": false,
"session_id": "dev-001",
"user": "{\"editor_content\":\"\",\"selected_text\":\"<base64>\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
}'
```
> **Editor context transport:** The `user` field carries editor context as a JSON string. `editor_content`, `selected_text`, and `extra_context` must be Base64-encoded. `user_info` is a JSON object with `dev_id`, `project_id`, and `org_id`. The engine only injects editor context into the response when the classifier detects the user is explicitly referring to their code. See [`docs/API_REFERENCE.md`](./docs/API_REFERENCE.md#6-openai-compatible-proxy) for full details.
**Streaming chat (SSE):**
```bash
curl http://localhost:8000/v1/chat/completions \
@ -635,7 +663,9 @@ For the full set of contribution standards, see [CONTRIBUTING.md](./CONTRIBUTING
| [docs/API_REFERENCE.md](./docs/API_REFERENCE.md) | Complete gRPC API contract, message types, client examples |
| [docs/RUNBOOK.md](./docs/RUNBOOK.md) | Operational playbooks, health checks, incident response |
| [docs/AVAP_CHUNKER_CONFIG.md](./docs/AVAP_CHUNKER_CONFIG.md) | `avap_config.json` reference — blocks, statements, semantic tags, how to extend |
| [docs/adr/](./docs/adr/) | Architecture Decision Records |
| [docs/ADR/](./docs/ADR/) | Architecture Decision Records |
| [docs/product/](./docs/product/) | Product Requirements Documents |
| [research/](./research/) | Experiment results, benchmarks, and datasets |
---

View File

@ -2,6 +2,88 @@
All notable changes to the **Brunix Assistance Engine** will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
---
## [1.6.3] - 2026-03-26
### Changed
- RESEARCH: updated ADR-0005-embedding-model-selection, BEIR benchmark results (qwen3-emb vs bge-m3) added.
### Added
- FEATURE: added `research/embeddings/evaluate_embeddings_pipeline.py` in order to facilitate model embedding evaluation with BEIR benchmarks.
- RESULTS: added qwen3-emb vs bge-m3 scores in BEIR benchmark.
## [1.6.2] - 2026-03-26
### Changed
- RESEARCH: updated `embeddings/Embedding model selection.pdf`.
## [1.6.1] - 2026-03-20
### Added
- FEATURE (PRD-0002): Extended `AgentRequest` in `brunix.proto` with four optional fields: `editor_content` (field 3), `selected_text` (field 4), `extra_context` (field 5), `user_info` (field 6) — enabling the VS Code extension to send active file content, selected code, free-form context, and client identity metadata alongside every query. Fields 35 are Base64-encoded; field 6 is a JSON string.
- FEATURE (PRD-0002): Extended `AgentState` with `editor_content`, `selected_text`, `extra_context`, `user_info`, and `use_editor_context` fields.
- FEATURE (PRD-0002): Extended classifier (`CLASSIFY_PROMPT_TEMPLATE`) to output two tokens — query type and editor signal (`EDITOR` / `NO_EDITOR`). `use_editor_context` flag set in state based on classifier output.
- FEATURE (PRD-0002): Editor context injected into generation prompt only when `use_editor_context=True` — prevents the model from referencing editor code when the question is unrelated.
- FEATURE (PRD-0002): `openai_proxy.py` — parses the standard OpenAI `user` field as a JSON string to extract `editor_content`, `selected_text`, `extra_context`, and `user_info`. Non-Brunix clients that send `user` as a plain string or omit it are handled gracefully with no error.
- FEATURE (PRD-0002): `server.py` — Base64 decoding of `editor_content`, `selected_text`, and `extra_context` on request arrival. Malformed Base64 is silently treated as empty string.
- TESTS: Added `Docker/tests/test_prd_0002.py` — 40 unit tests covering `_parse_query_type`, `_decode_b64`, `_parse_editor_context`, `_build_reformulate_query`, editor context injection logic, and all valid classifier output combinations. Runs without external dependencies (no Elasticsearch, no Ollama, no gRPC server required).
- DOCS: Added `docs/product/PRD-0001-openai-compatible-proxy.md` — product requirements document for the OpenAI-compatible HTTP proxy.
- DOCS: Added `docs/product/PRD-0002-editor-context-injection.md` — product requirements document for editor context injection (updated to Implemented status with full technical design).
- DOCS: Added `docs/ADR/ADR-0005-embedding-model-selection.md` — comparative evaluation of BGE-M3 vs Qwen3-Embedding-0.6B. Status: Under Evaluation.
- DOCS: Added `LICENSE` — proprietary license, 101OBEX, Corp, Delaware.
- DOCS: Added `research/` directory structure for MrHouston experiment results and benchmarks.
### Changed
- FEATURE (PRD-0002): `session_id` in `openai_proxy.py` is now read exclusively from the dedicated `session_id` field — no longer falls back to the `user` field. Breaking change for any client that was using `user` as a `session_id` fallback.
- ENGINE: `CLASSIFY_PROMPT_TEMPLATE` extended with `<editor_rule>` and updated `<output_rule>` for two-token output format.
- ENGINE: `REFORMULATE_PROMPT` extended with `<mode_rule>` and `<language_rule>` — the reformulator now receives `[MODE: X]` prepended to the query and applies command expansion only in `CODE_GENERATION` mode.
- ENGINE: `GENERATE_PROMPT` — added "Answer in the same language the user used" to `<output_format>`. Fixes responses defaulting to English for Spanish queries.
- ENGINE: `hybrid_search_native` in `graph.py` — BM25 query now uses a `bool` query with `should` boost for `doc_type: spec` and `doc_type: narrative` chunks, improving retrieval of definitional and explanatory content over raw code examples.
- DOCS: Updated `docs/API_REFERENCE.md` — full `AgentRequest` table with all 6 fields, Base64 encoding notes, editor context behaviour section, and updated proxy examples.
- DOCS: Updated `docs/ARCHITECTURE.md` — new §6 Editor Context Pipeline, updated §4 LangGraph Workflow with two-token classifier, §4.6 reformulator mode-aware and language-preserving, updated component inventory and request lifecycle diagrams.
- DOCS: Updated `README.md` — project structure with `Docker/tests/`, `docs/product/`, `docs/ADR/ADR-0005`, `research/`, `LICENSE`. HTTP proxy section updated with editor context curl examples. Documentation index updated.
- DOCS: Updated `CONTRIBUTING.md` — added Section 10 (PRDs), Section 11 (Research & Experiments Policy), updated PR checklist, ADR table with ADR-0005.
- DOCS: Updated `docs/AVAP_CHUNKER_CONFIG.md` to v2.0 — five new commands (else, end, endLoop, exception, return), naming fix (AddvariableToJSON), nine dual assignment patterns, four new semantic tags.
- GOVERNANCE: Updated `.github/CODEOWNERS` — added `@BRUNIX-AI/engineering` and `@BRUNIX-AI/research` teams, explicit rules for proto, golden dataset, grammar config, ADRs and PRDs.
### Fixed
- ENGINE: Fixed retrieval returning wrong chunks for Spanish definition queries — reformulator was translating Spanish queries to English, breaking BM25 lexical matching against Spanish LRM chunks. Root cause: missing language preservation rule in `REFORMULATE_PROMPT`.
- ENGINE: Fixed reformulator applying CODE_GENERATION command expansion to RETRIEVAL queries — caused "Que significa AVAP?" to reformulate as "AVAP registerEndpoint addResult _status". Root cause: reformulator had no awareness of query type. Fix: `[MODE: X]` prefix + mode-aware rules.
- ENGINE: Fixed responses defaulting to English regardless of query language. Root cause: `GENERATE_PROMPT` had no language instruction (unlike `CODE_GENERATION_PROMPT` which already had it).
---
## [1.6.0] - 2026-03-18
### Added
- ENGINE: Added `AskAgentStream` RPC — real token-by-token streaming directly from Ollama. Two-phase design: classify + reformulate + retrieve runs first via `build_prepare_graph`, then `llm.stream()` forwards tokens to the client as they arrive.
- ENGINE: Added `EvaluateRAG` RPC — RAGAS evaluation pipeline with Claude as judge. Runs faithfulness, answer_relevancy, context_recall and context_precision against a golden dataset and returns a global score with verdict (EXCELLENT / ACCEPTABLE / INSUFFICIENT).
- ENGINE: Added `openai_proxy.py` — OpenAI and Ollama compatible HTTP API running on port 8000. Routes `stream: false` to `AskAgent` and `stream: true` to `AskAgentStream`. Endpoints: `POST /v1/chat/completions`, `POST /v1/completions`, `GET /v1/models`, `POST /api/chat`, `POST /api/generate`, `GET /api/tags`, `GET /health`.
- ENGINE: Added `entrypoint.sh` — starts gRPC server and HTTP proxy as parallel processes with mutual watchdog. If either crashes, the container stops cleanly.
- ENGINE: Added session memory — `session_store` dict indexed by `session_id` accumulates full conversation history per session. Each request loads and persists history.
- ENGINE: Added query intent classifier — LangGraph node that classifies every query as `RETRIEVAL`, `CODE_GENERATION` or `CONVERSATIONAL` and routes to the appropriate subgraph.
- ENGINE: Added hybrid retrieval — replaced `ElasticsearchStore` (LangChain abstraction) with native Elasticsearch client. Each query runs BM25 `multi_match` and kNN in parallel, fused with Reciprocal Rank Fusion (k=60). Returns top-8 documents.
- ENGINE: Added `evaluate.py` — full RAGAS evaluation pipeline using the same hybrid retrieval as production, Claude as external judge, and the golden dataset in `Docker/src/golden_dataset.json`.
- PROTO: Added `AskAgentStream` and `EvaluateRAG` RPCs to `brunix.proto` with their message types (`EvalRequest`, `EvalResponse`, `QuestionDetail`).
- DOCS: Added `docs/ADR/ADR-0001-grpc-primary-interface.md`.
- DOCS: Added `docs/ADR/ADR-0002-two-phase-streaming.md`.
- DOCS: Added `docs/ADR/ADR-0003-hybrid-retrieval-rrf.md`.
- DOCS: Added `docs/ADR/ADR-0004-claude-eval-judge.md`.
- DOCS: Added `docs/samples/` — 30 representative `.avap` code samples covering all AVAP constructs.
### Changed
- ENGINE: Replaced `ElasticsearchStore` with native Elasticsearch client — fixes silent kNN failure caused by schema incompatibility between the Chonkie ingestion pipeline and the LangChain-managed index schema.
- ENGINE: Replaced single `GENERATE_PROMPT` with five specialised prompts — `CLASSIFY_PROMPT`, `REFORMULATE_PROMPT`, `GENERATE_PROMPT`, `CODE_GENERATION_PROMPT`, `CONVERSATIONAL_PROMPT` — each optimised for its routing path.
- ENGINE: Extended `REFORMULATE_PROMPT` with explicit AVAP command mapping — intent-to-command expansion for API, database, HTTP, loop and error handling query types.
- ENGINE: Extended `AgentState` with `query_type` and `session_id` fields required for conditional routing and session persistence.
- ENGINE: Fixed `session_id` ignored — `graph.invoke` now passes `session_id` into the graph state.
- ENGINE: Fixed double `is_final: True` — `AskAgent` previously emitted two closing messages. Now emits exactly one.
- ENGINE: Fixed embedding endpoint mismatch — server now uses the same `/api/embed` endpoint and payload format as both ingestion pipelines, ensuring vectors are comparable at query time.
- DEPENDENCIES: `requirements.txt` updated — added `ragas`, `datasets`, `langchain-anthropic`, `fastapi`, `uvicorn`.
### Fixed
- ENGINE: Fixed retrieval returning zero results — `ElasticsearchStore` assumed a LangChain-managed schema incompatible with the Chonkie-generated index. Replaced with native ES client querying actual field names.
- ENGINE: Fixed context always empty — consequence of the retrieval bug above. The generation prompt received an empty `{context}` on every request and always returned the fallback string.
---
## [1.5.1] - 2026-03-18

4974
construct_map.yaml Normal file

File diff suppressed because it is too large Load Diff

View File

@ -2,7 +2,7 @@
**Date:** 2026-02-09
**Status:** Accepted
**Deciders:** Rafael Ruiz (CTO, AVAP Technology), MrHouston Engineering
**Deciders:** Rafael Ruiz (CTO, AVAP Technology)
---

View File

@ -2,7 +2,7 @@
**Date:** 2026-03-05
**Status:** Accepted
**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
**Deciders:** Rafael Ruiz (CTO)
---

View File

@ -2,7 +2,7 @@
**Date:** 2026-03-05
**Status:** Accepted
**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
**Deciders:** Rafael Ruiz (CTO)
---

View File

@ -2,7 +2,7 @@
**Date:** 2026-03-10
**Status:** Accepted
**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
**Deciders:** Rafael Ruiz (CTO)
---

View File

@ -0,0 +1,276 @@
# ADR-0005: Embedding Model Selection — Comparative Evaluation of BGE-M3 vs Qwen3-Embedding-0.6B
**Date:** 2026-03-19
**Status:** Under Evaluation
**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
---
## Context
The AVAP RAG pipeline requires an embedding model capable of mapping a hybrid corpus into a vector space suitable for semantic retrieval. Understanding the exact composition of this corpus is a prerequisite for model selection.
### Corpus characterisation (empirically measured)
A chunk-level audit was performed on the full indexable corpus: the AVAP Language Reference Manual (`avap.md`) and 40 representative `.avap` code samples. Results (`test_chunks.jsonl`, 190 chunks):
| Metric | Value |
| -------------------- | ----------- |
| Total chunks | 190 |
| Total tokens indexed | 11,498 |
| Minimum chunk size | 1 token |
| Maximum chunk size | 833 tokens |
| Mean chunk size | 60.5 tokens |
| Median chunk size | 29 tokens |
| p90 | 117 tokens |
| p95 | 204 tokens |
| p99 | 511 tokens |
**Corpus composition by type:**
| Type | Count | Description |
| ------------------------- | ----- | ---------------------------------------- |
| Narrative (Spanish prose) | 79 | LRM explanations, concept descriptions |
| Code chunks | 83 | AVAP `.avap` sample files |
| BNF formal grammar | 9 | Formal language specification in English |
| Code examples | 14 | Inline examples within LRM |
| Function signatures | 2 | Extracted function headers |
**Linguistic composition:** 55% of chunks originate from the LRM (`avap.md`), written in Spanish with embedded English DSL identifiers. 45% are `.avap` code files containing English command names (`addVar`, `addResult`, `registerEndpoint`, `ormDirect`) with Spanish-language string literals and variable names (`"Hola"`, `datos_cliente`, `mi_json_final`, `contraseña`, `fecha`). 18.9% of chunks (36 out of 190) contain both Spanish content and English DSL commands within the same chunk — intra-chunk multilingual mixing.
Representative examples of intra-chunk multilingual mixing:
```
// Narrative chunk (Spanish prose + English DSL terms):
"AVAP (Advanced Virtual API Programming) es un DSL (Domain-Specific Language)
Turing Completo, diseñado para la orquestación segura de microservicios e I/O."
// Code chunk (English commands + Spanish identifiers and literals):
addParam("lang", l)
if(l, "es", "=")
addVar(msg, "Hola")
end()
addResult(msg)
// BNF chunk (formal English grammar):
<program> ::= ( <line> | <block_comment> )*
<statement> ::= <assignment> | <method_call_stmt> | <io_command> | ...
```
### Why the initial model was eliminated
The initial model provided was **Qwen2.5-1.5B**. Empirical evaluation by MrHouston Engineering (full results in `research/embeddings/`) demonstrated it is unsuitable for dense retrieval. Qwen2.5-1.5B generates embeddings via the **Last Token** method: the final token of the sequence is assumed to encode all preceding context. For AVAP code chunks, the last token is always a syntactic closer — `end()`, `}`, `endLoop()` — with zero semantic content. The resulting embeddings are effectively identical across functionally distinct chunks.
Benchmark confirmation (BEIR evaluation, three datasets):
**CodeXGLUE** (code retrieval from GitHub repositories):
| k | Qwen2.5-1.5B NDCG | Qwen2.5-1.5B Recall | Qwen3-Emb-0.6B NDCG | Qwen3-Emb-0.6B Recall |
| -- | ----------------- | ------------------- | ------------------- | --------------------- |
| 1 | 0.00031 | 0.00031 | **0.9497** | **0.9497** |
| 5 | 0.00086 | 0.00151 | **0.9716** | **0.9876** |
| 10 | 0.00118 | 0.00250 | **0.9734** | **0.9929** |
**CoSQA** (natural language queries over code — closest proxy to AVAP retrieval):
| k | Qwen2.5-1.5B NDCG | Qwen2.5-1.5B Recall | Qwen3-Emb-0.6B NDCG | Qwen3-Emb-0.6B Recall |
| --- | ----------------- | ------------------- | ------------------- | --------------------- |
| 1 | 0.00000 | 0.00000 | **0.1740** | **0.1740** |
| 10 | 0.00000 | 0.00000 | **0.3909** | **0.6700** |
| 100 | 0.00210 | 0.01000 | **0.4510** | **0.9520** |
**SciFact** (scientific prose — out-of-domain control):
| k | Qwen2.5-1.5B NDCG | Qwen2.5-1.5B Recall | Qwen3-Emb-0.6B NDCG | Qwen3-Emb-0.6B Recall |
| --- | ----------------- | ------------------- | ------------------- | --------------------- |
| 1 | 0.02333 | 0.02083 | **0.5633** | **0.5299** |
| 10 | 0.04619 | 0.07417 | **0.6855** | **0.8161** |
| 100 | 0.07768 | 0.23144 | **0.7129** | **0.9400** |
Qwen2.5-1.5B is eliminated. **Qwen3-Embedding-0.6B is the validated baseline.**
### Why a comparative evaluation was required before adopting Qwen3
Qwen3-Embedding-0.6B's benchmark results were obtained on English-only datasets. They eliminated Qwen2.5-1.5B decisively but did not characterise Qwen3's behaviour on the multilingual mixed corpus that AVAP represents. A second candidate — **BGE-M3** — presented theoretical advantages for this specific corpus that could not be assessed without empirical comparison.
The index rebuild required to adopt any model is destructive and must be done once. Given that the embedding model directly determines the quality of all RAG retrieval in production, adopting a model without a direct comparison between the two viable candidates would not have met the due diligence required for a decision of this impact.
---
## Decision
A **head-to-head comparative evaluation** of BGE-M3 and Qwen3-Embedding-0.6B is being conducted under identical conditions before either is adopted as the production embedding model.
The model that demonstrates superior performance under the evaluation criteria defined below is adopted. This ADR moves to Accepted upon completion of that evaluation, with the selected model documented as the outcome.
---
## Candidate Analysis
### Qwen3-Embedding-0.6B
**Strengths:**
- Already benchmarked on CodeXGLUE, CoSQA and SciFact — strong results documented
- 32,768 token context window — exceeds corpus requirements with large margin
- Same model family as the generation model (Qwen) — shared tokenizer vocabulary
- Lowest integration risk — already validated in the pipeline
**Limitations:**
- Benchmarks are English-only — multilingual performance on AVAP corpus unvalidated
- Not a dedicated multilingual model — training distribution weighted towards English and Chinese
- No native sparse retrieval support
**Corpus fit assessment:** The maximum chunk in the AVAP corpus is 833 tokens — well within both candidates' limits. Qwen3's 32,768 token context window provides no practical advantage over BGE-M3's 8,192 tokens for this corpus. Context window is not a differentiating criterion.
### BGE-M3
**Strengths:**
- Explicit multilingual contrastive training across 100+ languages including programming languages — direct architectural fit for the intra-chunk Spanish/English/DSL mixing observed in the corpus
- Supports dense, sparse and multi-vector ColBERT retrieval from a single model inference — future path to consolidating the current BM25+kNN dual-system architecture (ADR-0003)
- Higher MTEB retrieval score than Qwen3-Embedding-0.6B in the programming domain
**Limitations:**
- Not yet benchmarked on CodeXGLUE, CoSQA or SciFact at the time of candidate selection — no prior empirical results for this corpus
- 8,192 token context window — sufficient for current corpus (max chunk: 833 tokens, 10.2% utilization) but lower headroom for future corpus growth
- Requires tokenizer alignment: `HF_EMB_MODEL_NAME` must be updated to `BAAI/bge-m3` alongside `OLLAMA_EMB_MODEL_NAME` to keep chunk token counting consistent
**Corpus fit assessment:** The intra-chunk multilingual mixing (18.9% of chunks) and the Spanish prose component (79 narrative chunks) are the corpus characteristics most likely to differentiate BGE-M3 from Qwen3. The BEIR and EvaluateRAG evaluations determine whether this theoretical advantage translates to measurable retrieval improvement.
### VRAM
Both candidates require approximately 1.13 GiB at FP16 (BGE-M3: 567M parameters; Qwen3: 596M parameters). Combined with a quantized generation model and KV cache, total VRAM remains within the 4 GiB hardware constraint for both. VRAM is not a selection criterion.
### Embedding dimension
Both candidates output 1024-dimensional vectors. The Elasticsearch index mapping (`int8_hnsw`, `dims: 1024`, cosine similarity) is identical for both candidates. No mapping changes are required between them.
---
## Evaluation Protocol
Both models are evaluated under identical conditions. All results are documented in `research/embeddings/`.
**Step 1 — BEIR benchmarks**
CodeXGLUE, CoSQA and SciFact were run with **BGE-M3** using the same BEIR evaluation scripts and configuration used for Qwen3-Embedding-0.6B. Qwen3-Embedding-0.6B results already existed in `research/embeddings/` and served as the baseline. Reported metrics: NDCG@k, MAP@k, Recall@k and Precision@k at k = 1, 3, 5, 10, 100.
**Step 2 — EvaluateRAG on AVAP corpus**
The Elasticsearch index is rebuilt twice — once with each model — and `EvaluateRAG` is run against the production AVAP golden dataset for both. Reported RAGAS scores: faithfulness, answer_relevancy, context_recall, context_precision, and global score with verdict.
**Selection criterion**
EvaluateRAG is the primary decision signal. It directly measures retrieval quality on the actual AVAP production corpus — including its intra-chunk multilingual mixing (18.9% of chunks) and domain-specific DSL syntax — and is therefore more representative than any external benchmark. The model with the higher global EvaluateRAG score is adopted.
BEIR results are the secondary signal. The primary BEIR metric is NDCG@10. Among the three datasets, **CoSQA is the most representative proxy** for the AVAP retrieval use case — it pairs natural language queries with code snippets, mirroring the Spanish prose query / AVAP DSL code retrieval pattern. CoSQA results are weighted accordingly in the comparison.
All margin comparisons use **absolute percentage points** in NDCG@10 (e.g., 0.39 vs 0.41 is a 2 absolute percentage point difference, not a 5.1% relative difference).
**Tiebreaker**
If the EvaluateRAG global scores are within 5 absolute percentage points of each other, the BEIR results determine the outcome under the following conditions:
- BGE-M3 exceeds Qwen3-Embedding-0.6B by more than 2 absolute percentage points on mean NDCG@10 across all three BEIR datasets, AND
- BGE-M3 does not underperform Qwen3-Embedding-0.6B by more than 2 absolute percentage points on CoSQA NDCG@10 specifically.
If neither condition is met — that is, if EvaluateRAG scores are within 5 points and BGE-M3 does not clear both BEIR thresholds — Qwen3-Embedding-0.6B is adopted. It carries lower integration risk, its benchmarks are already documented, and it is the validated baseline for the system.
---
## Rationale
### Step 1 results — BEIR head-to-head comparison
BGE-M3 benchmarks were completed on the same three BEIR datasets using identical evaluation scripts and configuration. Full results are stored in `research/embeddings/embedding_eval_results/emb_models_result.json`. The following tables compare both candidates side by side.
**CodeXGLUE** (code retrieval from GitHub repositories):
| Metric | k | BGE-M3 | Qwen3-Emb-0.6B | Delta (BGE-M3 Qwen3) |
| ------ | --- | ---------------- | ---------------- | ----------------------- |
| NDCG | 1 | **0.9520** | 0.9497 | +0.23 pp |
| NDCG | 5 | **0.9738** | 0.9717 | +0.21 pp |
| NDCG | 10 | **0.9749** | 0.9734 | +0.15 pp |
| NDCG | 100 | **0.9763** | 0.9745 | +0.18 pp |
| Recall | 1 | **0.9520** | 0.9497 | +0.23 pp |
| Recall | 5 | **0.9892** | 0.9876 | +0.16 pp |
| Recall | 10 | 0.9928 | **0.9930** | 0.02 pp |
| Recall | 100 | **0.9989** | 0.9981 | +0.08 pp |
Both models perform near-identically on CodeXGLUE. All deltas are below 0.25 absolute percentage points. This dataset does not differentiate the candidates.
**CoSQA** (natural language queries over code — most representative proxy for AVAP retrieval):
| Metric | k | BGE-M3 | Qwen3-Emb-0.6B | Delta (BGE-M3 Qwen3) |
| ------ | --- | ------ | ---------------- | ----------------------- |
| NDCG | 1 | 0.1160 | **0.1740** | 5.80 pp |
| NDCG | 5 | 0.2383 | **0.3351** | 9.68 pp |
| NDCG | 10 | 0.2878 | **0.3909** | 10.31 pp |
| NDCG | 100 | 0.3631 | **0.4510** | 8.79 pp |
| Recall | 1 | 0.1160 | **0.1740** | 5.80 pp |
| Recall | 5 | 0.3660 | **0.5020** | 13.60 pp |
| Recall | 10 | 0.5160 | **0.6700** | 15.40 pp |
| Recall | 100 | 0.8740 | **0.9520** | 7.80 pp |
Qwen3-Embedding-0.6B outperforms BGE-M3 on CoSQA by a wide margin at every k. The NDCG@10 gap is 10.31 absolute percentage points. CoSQA is the most representative proxy for the AVAP retrieval use case — it pairs natural language queries with code snippets — making this the most significant BEIR result.
**SciFact** (scientific prose — out-of-domain control):
| Metric | k | BGE-M3 | Qwen3-Emb-0.6B | Delta (BGE-M3 Qwen3) |
| ------ | --- | ------ | ---------------- | ----------------------- |
| NDCG | 1 | 0.5100 | **0.5533** | 4.33 pp |
| NDCG | 5 | 0.6190 | **0.6593** | 4.03 pp |
| NDCG | 10 | 0.6431 | **0.6785** | 3.54 pp |
| NDCG | 100 | 0.6705 | **0.7056** | 3.51 pp |
| Recall | 1 | 0.4818 | **0.5243** | 4.25 pp |
| Recall | 5 | 0.7149 | **0.7587** | 4.38 pp |
| Recall | 10 | 0.7834 | **0.8144** | 3.10 pp |
| Recall | 100 | 0.9037 | **0.9367** | 3.30 pp |
Qwen3-Embedding-0.6B leads BGE-M3 on SciFact by 34 absolute percentage points across all metrics. The gap is consistent but narrower than on CoSQA.
### BEIR summary — NDCG@10 comparison
| Dataset | BGE-M3 | Qwen3-Emb-0.6B | Delta | Leader |
| -------------- | ---------------- | ---------------- | ------------------- | ----------------- |
| CodeXGLUE | 0.9749 | 0.9734 | +0.15 pp | BGE-M3 (marginal) |
| CoSQA | 0.2878 | **0.3909** | 10.31 pp | **Qwen3** |
| SciFact | 0.6431 | **0.6785** | 3.54 pp | **Qwen3** |
| **Mean** | **0.6353** | **0.6809** | **4.56 pp** | **Qwen3** |
Qwen3-Embedding-0.6B leads on mean NDCG@10 by 4.56 absolute percentage points, driven primarily by a 10.31 pp advantage on CoSQA.
### Application of tiebreaker criteria to BEIR results
Per the evaluation protocol, if EvaluateRAG global scores are within 5 absolute percentage points, the BEIR tiebreaker applies. The tiebreaker requires BGE-M3 to meet **both** conditions:
1. **BGE-M3 must exceed Qwen3 by more than 2 pp on mean NDCG@10.** Result: BGE-M3 trails by 4.56 pp. **Condition not met.**
2. **BGE-M3 must not underperform Qwen3 by more than 2 pp on CoSQA NDCG@10.** Result: BGE-M3 trails by 10.31 pp. **Condition not met.**
Neither tiebreaker condition is satisfied. Under the defined protocol, if the EvaluateRAG evaluation results in a tie (within 5 pp), the BEIR tiebreaker defaults to Qwen3-Embedding-0.6B.
### Step 2 results — EvaluateRAG on AVAP corpus
At this moment, we are not in possesion of the golden dataset, cannot proceed with step 2.
_Pending. Results will be documented here upon completion of the EvaluateRAG evaluation for both models._
### Preliminary assessment
The BEIR benchmarks — the secondary decision signal — favour Qwen3-Embedding-0.6B across both the most representative dataset (CoSQA, 10.31 pp) and the out-of-domain control (SciFact, 3.54 pp), with CodeXGLUE effectively tied. BGE-M3's theoretical advantage from multilingual contrastive training does not translate to superior performance on these English-only benchmarks.
The EvaluateRAG evaluation — the primary decision signal — remains pending. It is the only evaluation that directly measures retrieval quality on the actual AVAP corpus with its intra-chunk multilingual mixing. BGE-M3's architectural fit for multilingual content could still produce a measurable advantage on the production corpus that the English-only BEIR benchmarks cannot capture. No final model selection will be made until EvaluateRAG results are available for both candidates.
We have found that Qwen3-embedding is multi-lingual, with good scores in multi-lingual benchmarks. The documentation says so, but the definitive answer will be provided by the scores of the evaluation on the AVAP corpus.
---
## Consequences
- **Index rebuild required** regardless of which model is adopted. Vectors from Qwen2.5-1.5B are incompatible with either candidate. The existing index is deleted before re-ingestion.
- **Two index rebuilds required for the evaluation.** One per candidate for the EvaluateRAG step. Given the current corpus size (190 chunks, 11,498 tokens), rebuild time is not a meaningful constraint.
- **Tokenizer alignment for BGE-M3.** If BGE-M3 is selected, both `OLLAMA_EMB_MODEL_NAME` and `HF_EMB_MODEL_NAME` are updated. Updating only `OLLAMA_EMB_MODEL_NAME` causes the chunker to estimate token counts using the wrong vocabulary — a silent bug that produces inconsistent chunk sizes without raising any error.
- **Future model changes.** Any future replacement of the embedding model follows the same evaluation protocol — BEIR benchmarks on the same three datasets plus EvaluateRAG — before an ADR update is accepted. Results are documented in `research/embeddings/`.

View File

@ -0,0 +1,363 @@
# ADR-0006: Reward Algorithm for Self-Improving Dataset Synthesis
**Date:** 2026-03-25
**Status:** Under Evaluation — Primary comparison: Candidate A vs Candidate E vs Candidate F
**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering (AI Team)
**Research lead:** Ivar Zapata
---
## Context
The AVAP dataset synthesis pipeline (Track A) generates AVAP code examples using a large language model, filtered by a three-stage quality pipeline: parser validation (Stage 1), Execution Coverage Score (Stage 2), and semantic novelty (Stage 3). The current pipeline has two structural limitations that the reward mechanism must address.
### Limitation 1 — Static generation
Each batch is generated from the same static prompt (LRM + category description). The generator has no memory of what it has already produced and no model of what "good" looks like for the constructs it hasn't explored yet.
### Limitation 2 — Distribution bias (the fundamental problem)
The generator (Claude claude-sonnet) has its own internal distribution over what AVAP code "looks like", derived from its training on mainstream languages. It naturally gravitates toward the simplest patterns — linear code, basic conditionals, single-construct examples — because those are closest to what it knows. Any reward mechanism based on selecting the best from what the model spontaneously produces and feeding those back as few-shots **amplifies this bias**: the pool fills with what the model does easily, and the model never explores what it does poorly.
This is not model collapse in the classical sense (weights are not updated), but it is **cumulative distribution bias** — the effective generation distribution narrows toward the model's comfort zone with each iteration.
### The correct framing
The solution is not to reward what the model produces spontaneously. It is to **specify externally what must be produced** and evaluate quality relative to that specification. Coverage of the DSL's grammar space must be guaranteed by construction, not hoped for through probabilistic exploration.
---
## Decision
**Conduct a primary comparative evaluation of Candidate A (CW-Reward, reward-driven pool), Candidate E (MAP-Elites, externally-specified coverage cells), and Candidate F (MAP-Elites with ConstructPrior transfer from real production code)** before selecting the production algorithm. Candidates B, C, D are secondary alternatives evaluated only if none of A, E, or F meets quality thresholds.
The fundamental research question has two layers:
1. **Does forced external specification of construct combinations produce a less biased, higher-quality dataset than reward-driven spontaneous exploration?** (A vs E)
2. **Does seeding cell selection with real production code co-occurrence distributions further improve coverage quality and downstream RAG performance over blind MAP-Elites?** (E vs F)
---
## Candidate Analysis
### Candidate A — CW-Reward (Composite Weighted Reward)
**Algorithm class:** In-context reward — no parameter updates.
**Mechanism:** A composite reward is computed for each parser-valid example:
```
reward(e) = w_ecs · ECS(e) + w_novelty · Jaccard_novelty(e, Pool) + w_tests · test_quality(e)
```
High-reward examples enter a GoldPool (top-K). The pool is injected as few-shot context in subsequent generation calls. Coverage summary steers the prompt toward underrepresented constructs.
**Known bias risk:** The pool amplifies the model's natural generation distribution. Examples that are easy for the model (simple patterns, single constructs) tend to enter the pool first and persist. The Jaccard novelty metric penalises structural similarity but cannot detect semantic simplicity — two examples with different node type sets can both be trivially shallow.
**Appropriate when:** The base LLM has strong prior knowledge of the target language (mainstream languages). For AVAP, where the model has zero prior knowledge, the bias risk is materially higher.
---
### Candidate E — MAP-Elites with Externally-Defined Coverage Cells (Proposed Primary)
**Algorithm class:** Quality-Diversity algorithm — no parameter updates, coverage guaranteed by construction.
**Core insight:** Instead of rewarding the best examples from spontaneous generation, define the coverage space externally from the grammar and direct the generator to fill specific cells. The model's distribution bias is neutralised because it is never asked to "explore freely" — it is always given a precise specification.
**Coverage space definition:**
The behavior space is defined over **pairs and trios of AVAP node types** drawn from the full grammar vocabulary. Each cell represents a construct combination that must be represented in the dataset:
```
Cell key = frozenset of 2 or 3 AVAP node types
Cell value = (best_example_so_far, quality_score)
Example cells:
{"startLoop", "ormAccessSelect"} → best example using both
{"try", "go", "RequestPost"} → best example using all three
{"function", "if_mode2", "encodeSHA256"} → best example using all three
```
**Space size:**
- Pairs: C(38, 2) = 703 cells
- Trios: C(38, 3) = 8,436 cells
- Total: 9,139 cells
With 5,000 examples targeted, average coverage is ~0.55 examples per cell — statistical coverage of pairwise and triadic construct combinations is achievable with focused cell selection strategy. Full coverage of high-prior cells is expected within budget; tail cells are addressed in Phase 3.
**Generation protocol:**
```
1. SELECT target cell:
- Empty cells first (exploration phase)
- Then lowest-quality cells (exploitation phase)
- Interleave: every 10 calls, select a cell adjacent to a
recently improved cell (local neighborhood search)
2. SPECIFY in the prompt:
"Generate an AVAP example that MUST use ALL of these constructs:
{cell_constructs}. Use additional constructs where natural."
3. VALIDATE:
a. Parser: syntactically valid? (Stage 1)
b. Construct presence: all cell constructs in AST? (cell gate)
c. If both pass → compute cell quality score
4. UPDATE cell:
If quality > current cell quality → replace cell entry
```
**Cell quality score:**
```
cell_quality(e, cell) =
construct_fidelity(e, cell) # fraction of cell constructs actually present
+ α · bonus_constructs(e, cell) # extra constructs beyond cell specification
+ β · test_quality(e) # quality of test assertions
+ γ · code_length_norm(e) # normalised code length (longer = richer)
```
`construct_fidelity` is the primary gate: an example that does not contain all cell constructs scores 0 regardless of other criteria.
**Why this eliminates distribution bias:**
The model is never asked what it "wants" to generate. It receives a precise specification: "you must use these three constructs." If it produces something that satisfies the specification, it enters the map. If not, it is discarded and the cell remains available for the next attempt. The coverage trajectory is determined by the cell selection strategy, not by the model's natural distribution.
The only residual bias is the model's ability to satisfy arbitrary construct specifications — some cells may be harder to fill than others. This is empirically measurable (fill rate per cell) and is itself a research finding about the generator's capabilities.
**Appropriate when:** The target language is novel or partially unknown to the generator. The external specification mechanism compensates for the model's lack of prior knowledge.
---
### Candidate F — MAP-Elites with ConstructPrior Transfer (Proposed Disruptive Extension)
**Algorithm class:** Quality-Diversity algorithm with informed cell selection — no parameter updates, coverage guaranteed by construction.
**Core insight:** Candidate E specifies *which* constructs must appear but treats all cells as equally valuable. Real production code does not use constructs uniformly: some combinations (e.g., `ormAccessSelect` + `try`) appear in virtually every real API endpoint; others (e.g., `encodeSHA256` + `startLoop`) appear rarely. A golden dataset that mirrors production code distributions will retrieve more relevant examples for real developer queries. The ConstructPrior module transfers this knowledge from large public codebases to weight MAP-Elites cell selection.
**ConstructPrior design:**
```
ConstructPrior = weighted combination of 4 domain sources:
Source 1 — The Stack (BigCode, 50% weight)
Filter: paths matching /api/, /routes/, /handlers/, /endpoints/
Languages: Python, Go, JavaScript/TypeScript, Java
Process: extract function-level code blocks → map language constructs
to AVAP semantic equivalents → compute co-occurrence frequency
per (construct_a, construct_b) and (construct_a, construct_b, construct_c)
Rationale: real microservice API code; largest and most representative source
Source 2 — CodeSearchNet (30% weight)
Filter: semantic search for "api endpoint", "http handler", "database query"
Languages: Python, Go, Java, JavaScript
Process: same mapping pipeline as Source 1
Rationale: function-docstring pairs provide semantic context for mapping quality
Source 3 — HumanEval-X Go (10% weight)
Filter: problems using goroutines, channels, wait groups
Process: map Go concurrency primitives → AVAP {go, gather, startLoop}
Rationale: AVAP's concurrency model mirrors Go's; coverage of concurrent patterns
Source 4 — Spider SQL Dataset (10% weight)
Filter: multi-table joins, aggregations, nested queries
Process: map SQL operations → AVAP {ormAccessSelect, ormAccessInsert, ormAccessUpdate}
Rationale: AVAP ORM constructs semantically equivalent to SQL clauses
```
**Construct mapping table (AVAP ← source constructs):**
| AVAP construct | Python equivalent | Go equivalent | SQL equivalent |
|---|---|---|---|
| `ormAccessSelect` | `cursor.fetchall()`, `session.query()` | `db.Query()`, `rows.Scan()` | `SELECT` |
| `ormAccessInsert` | `session.add()`, `cursor.execute(INSERT)` | `db.Exec(INSERT)` | `INSERT INTO` |
| `ormAccessUpdate` | `session.merge()`, `cursor.execute(UPDATE)` | `db.Exec(UPDATE)` | `UPDATE` |
| `RequestGet` | `requests.get()`, `httpx.get()` | `http.Get()`, `client.Get()` | — |
| `RequestPost` | `requests.post()`, `httpx.post()` | `http.Post()`, `client.Post()` | — |
| `startLoop` | `for item in list:` | `for _, v := range` | `CURSOR LOOP` |
| `go` + `gather` | `asyncio.gather()`, `ThreadPoolExecutor` | `go func()`, `sync.WaitGroup` | — |
| `try` + `exception` | `try: except:` | `if err != nil` | — |
| `encodeSHA256` | `hashlib.sha256()` | `sha256.New()` | — |
| `function` | `def func():` | `func name()` | `CREATE FUNCTION` |
**Cell weighting formula:**
```
cell_prior_weight(cell) =
Σ_{s ∈ Sources} weight_s · freq_s(cell_constructs)
where freq_s(cell) = co-occurrence frequency of the construct set in source s,
normalized to [0, 1] within each source.
Cells with prior_weight = 0 (no source coverage) receive a minimum weight ε = 0.05
to ensure all cells remain reachable.
```
**Modified cell selection with ConstructPrior:**
```
PHASE 1 (exploration):
Select empty cells, weighted by cell_prior_weight.
High-prior cells filled first — these are patterns real developers use.
PHASE 2 (exploitation):
Select lowest-quality filled cells, UCB-weighted,
also weighted by cell_prior_weight.
High-prior, low-quality cells deprioritized for richer improvement.
PHASE 3 (tail coverage):
Cells with prior_weight = ε are visited last, after all
production-relevant cells reach quality > 0.7.
Ensures complete mathematical coverage without wasting
early generation budget on rare combinations.
```
**Why this is disruptive:**
1. **First formal connection between DSL dataset synthesis and production code distributions.** Prior dataset synthesis work (MBPP, HumanEval, APPS) uses human-authored problems or scrapes competitive programming sites. For novel DSLs with no prior human authors, this approach provides the first principled method to bootstrap coverage from semantically equivalent languages.
2. **Eliminates the uniform sampling assumption.** Standard Quality-Diversity algorithms treat all niches as equally valuable. The ConstructPrior breaks this assumption: cells that correspond to real production patterns are assigned higher value, producing a dataset whose distribution mirrors real developer usage rather than mathematical combinatorial completeness.
3. **Zero human annotation required.** The prior is derived automatically from public datasets under permissive licenses (The Stack: Apache 2.0; CodeSearchNet: MIT; HumanEval-X: MIT; Spider: CC BY-SA 4.0).
4. **Residual bias is semantic, not structural.** Candidate E's residual bias is the model's ability to satisfy arbitrary construct specifications (some cells may be hard to fill). Candidate F's residual bias is the construct mapping quality (how faithfully Python/Go/SQL constructs map to AVAP equivalents). The latter is measurable, improvable, and fully transparent.
**Expected improvement over Candidate E:**
- RAGAS Composite: +0.030.08 (hypothesis: production-weighted cells retrieve more relevant examples for real queries)
- Distribution entropy: similar or slightly lower than E (intentionally non-uniform — mirrors production distribution)
- Downstream task success: +515% on held-out real developer queries (hypothesis: high-prior cells produce examples that match actual query patterns)
**Appropriate when:** Target DSL has identifiable semantic equivalents in mainstream languages, and a production-weighted dataset is preferred over a mathematically uniform one.
---
### Out of Scope — Fine-tuning Approaches (GRPO, DPO)
Gradient-based approaches (GRPO, DPO) address a **different problem**: fine-tuning the inference model after the dataset is built. This ADR concerns dataset synthesis algorithm design. Fine-tuning the inference model is a separate architectural decision, tracked separately, and is not evaluated here.
Per-iteration fine-tuning of the generator (training the generator on its own outputs between batches) is explicitly rejected as a design choice. Iteratively training a model on its own outputs produces cumulative distribution narrowing. The generator (Claude API) and any future inference model must be trained on separate, independently validated datasets.
---
### Candidate D — UCB Bandit over Coverage Regions
**Algorithm class:** Multi-armed bandit.
Coverage regions are arms. UCB selects which region to target via exploration-exploitation tradeoff. Theoretically well-understood convergence guarantees but does not provide construct-level specification — it targets regions, not specific combinations. Less precise than Candidate E.
**Superseded by Candidate E** for the same computational cost with stronger guarantees.
---
## Comparative Summary
| Property | A: CW-Reward | E: MAP-Elites | F: MAP-Elites+Prior |
|---|---|---|---|
| Distribution bias risk | **High** | **None** | **None** |
| Coverage guarantee | Probabilistic | **By construction** | **By construction** |
| Production code alignment | None | None | **Yes (weighted)** |
| LLM parameter updates | No | No | No |
| GPU requirement | None | None | None |
| Works with API-only LLM | Yes | Yes | Yes |
| Interpretability | High | **Very high** | **Very high** |
| Implementation complexity | Low | Medium | **Medium-High** |
| Convergence guarantee | No | **Yes (fill rate)** | **Yes (fill rate)** |
| Residual bias | Model distribution | Cell fill difficulty | Mapping quality |
| External data required | No | No | Yes (public, free) |
| Novel contribution | Low | Medium | **High** |
---
## Evaluation Protocol
### Phase 1 — Candidate A vs Candidate E vs Candidate F
Run all three candidates for 500 generated examples each, same LRM, same parser, same Stage 1 filter. Fixed random seed for reproducibility.
**Primary metrics:**
| Metric | Definition | Expected winner |
|---|---|---|
| Cell fill rate | Fraction of 9,139 cells with ≥1 example (E/F only) | E≈F by construction |
| Coverage breadth | Distinct node types covered / total | E≈F |
| Distribution uniformity | Entropy of node type frequency distribution | E (flatter = better) |
| Production alignment | KL divergence between dataset and ConstructPrior distribution | **F** (by design) |
| Mean cell quality | Average quality score across filled cells | TBD empirically |
| Parser pass rate trend | Pass rate across iterations | A (if few-shots help) |
| Downstream RAGAS | RAGAS Composite on 50 held-out AVAP queries | **Primary decision signal** |
**Distribution uniformity** is the key metric for bias detection (A vs E). Plot node type frequency as a histogram. Candidate A will show a long-tail distribution. Candidate E should show a near-uniform distribution. Candidate F will show a production-weighted distribution (intentionally non-uniform — this is a feature, not a bug).
**Production alignment** is the key metric for F vs E. A dataset with low KL divergence from ConstructPrior produces examples that match real developer usage patterns. If RAGAS(F) > RAGAS(E), this validates the transfer prior hypothesis.
**Selection criterion:**
- A vs E: Candidate E wins if entropy > 3.0 bits AND RAGAS(E) ≥ RAGAS(A).
- E vs F: Candidate F wins if RAGAS(F) > RAGAS(E) by margin ≥ 0.02.
- If F wins both comparisons, F is the production algorithm.
- Fallback: if RAGAS margin F vs E < 0.02, use E (simpler, no external data dependency).
---
## Weight and Hyperparameter Grids
### Candidate A weight grid
| Config | w_ecs | w_novelty | w_tests | Hypothesis |
|---|---|---|---|---|
| A1 | 0.50 | 0.35 | 0.15 | Balanced (baseline) |
| A2 | 0.70 | 0.20 | 0.10 | Coverage-heavy |
| A3 | 0.30 | 0.60 | 0.10 | Novelty-heavy |
| A4 | 0.85 | 0.00 | 0.15 | No novelty (ablation) |
A4 is the critical ablation: does novelty weighting reduce distribution bias, or is ECS alone sufficient?
### Candidate E hyperparameter grid
| Config | Cell size | Selection strategy | α (bonus constructs) |
|---|---|---|---|
| E1 | Pairs only | Empty-first | 0.2 |
| E2 | Pairs + Trios | Empty-first | 0.2 |
| E3 | Pairs + Trios | UCB-weighted | 0.2 |
| E4 | Pairs + Trios | Empty-first | 0.5 |
E2 is the baseline. E3 tests whether UCB cell selection improves quality over simple empty-first ordering. E4 tests whether a higher bonus for extra constructs produces richer examples.
### Candidate F hyperparameter grid
| Config | Prior sources | Phase 3 threshold | ε (tail minimum) | Mapping strictness |
|---|---|---|---|---|
| F1 | All 4 sources (50/30/10/10) | q > 0.7 | 0.05 | Lenient (keyword match) |
| F2 | All 4 sources (50/30/10/10) | q > 0.7 | 0.05 | Strict (AST-level match) |
| F3 | Stack only (100%) | q > 0.7 | 0.05 | Lenient |
| F4 | All 4 sources (50/30/10/10) | q > 0.5 | 0.10 | Lenient |
F1 is the baseline. F2 tests whether strict construct mapping (requiring AST-level evidence vs keyword presence) improves prior quality. F3 is the ablation: does the multi-source mixture add value over The Stack alone? F4 tests earlier phase transition and higher minimum tail weight.
---
## Open Questions for the Scientific Team
1. **Cell selection with difficulty weighting:** Some cells may be intrinsically hard to fill (e.g., combining `go` + `avapConnector` + `ormAccessSelect` in a single coherent example). Should the cell selection strategy account for historical fill difficulty, or treat all cells equally?
2. **Cross-cell quality:** An example generated for cell {A, B} may also be a high-quality example for cell {A, C} if it happens to use C as well. Should examples be indexed against all cells they satisfy, or only the cell they were generated for?
3. **Minimum example length per cell:** Short examples (35 lines) can technically satisfy a cell specification with minimal semantic content. Should a minimum code complexity threshold (e.g., minimum AST depth, minimum number of statements) be required for cell admission?
4. **Cell retirement:** Once a cell reaches quality score > 0.90, should it be retired from the selection pool to focus generation effort on harder cells?
5. **Generalisation to KCL:** The KCL grammar has different node types. Does the MAP-Elites cell space need to be redefined per language, or can a universal cell structure be derived from shared construct categories (type_definition, validation, control_flow, io)?
6. **ConstructPrior mapping quality:** The construct mapping (e.g., Python `session.query()` → AVAP `ormAccessSelect`) is heuristic. Should mapping quality be validated against a small manually annotated equivalence set before running the full generation pipeline? If the mapping is noisy, the prior weights may be misleading — a high-frequency Python pattern that maps incorrectly to a rare AVAP pattern would over-weight a non-representative cell.
7. **Prior refresh cadence:** The Stack and CodeSearchNet are static snapshots. If AVAP adoption grows and native AVAP code becomes available, should the ConstructPrior be retrained on AVAP-native data, effectively transitioning from transfer learning to self-supervised learning? Define the minimum corpus size threshold at which native data supersedes the cross-language prior.
---
## Consequences
- `generate_mbap_v2.py` is rewritten to implement Candidate F (MAP-Elites + ConstructPrior) as the primary algorithm. Candidate E (MAP-Elites without prior) is available via `--mode map-elites`. Candidate A (CW-Reward) is available via `--mode reward`. All three modes use identical parser, stage filters, and cell definitions to ensure fair comparison.
- A `ConstructPrior` module (`construct_prior.py`) handles multi-source data download, construct extraction, language-to-AVAP mapping, and co-occurrence matrix construction. This module is isolated from the core MAP-Elites loop and can be updated independently.
- The construct mapping table (language construct → AVAP equivalent) is maintained as a versioned configuration file (`construct_map.yaml`) and must not be modified after generation begins for a given dataset version.
- Results must be documented in `research/reward/` before this ADR is closed. Required artefacts: entropy histograms for A/E/F, KL divergence plots, RAGAS Composite comparison table, cell fill rate heatmaps.
- Any change to cell definitions, quality metrics, or the construct mapping table requires full dataset regeneration.
- Per-iteration fine-tuning of the generator is rejected and will not be re-evaluated without new evidence addressing the distribution narrowing risk.

View File

@ -0,0 +1,78 @@
# ADR-0006: Code Indexing Improvements — Comparative Evaluation of code chunking strategies
**Date:** 2026-03-24
**Status:** Proposed
**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
---
## Context
Efficient code indexing is a critical component for enabling high-quality code search, retrieval-augmented generation (RAG), and semantic understanding in developer tooling. The main challenge lies in representing source code in a way that preserves its syntactic and semantic structure while remaining suitable for embedding-based retrieval systems.
In this context, we explored different strategies to improve the indexing of .avap code files, starting from a naïve approach and progressively moving toward more structured representations based on parsing techniques.
### Alternatives
- File-level chunking (baseline):
Each .avap file is treated as a single chunk and indexed directly. This approach is simple and fast but ignores internal structure (functions, classes, blocks).
- EBNF chunking as metadata:
Each .avap file is still treated as a single chunk and indexed directly. However, by using the AVAP EBNF syntax, we extract the AST structure and injects it into the chunk metadata.
- Full EBNF chunking:
Each .avap file is still treated as a single chunk and indexed directly. The difference between this approach and the last 2, is that the AST is indexed instead the code.
- Grammar definition chunking:
Code is segmented using a language-specific configuration (`avap_config.json`) instead of one-file chunks. The chunker applies a lexer (comments/strings), identifies multi-line blocks (`function`, `if`, `startLoop`, `try`), classifies single-line statements (`registerEndpoint`, `orm_command`, `http_command`, etc.), and enriches every chunk with semantic tags (`uses_orm`, `uses_http`, `uses_async`, `returns_result`, among others).
This strategy also extracts function signatures as dedicated lightweight chunks and propagates local context between nearby chunks (semantic overlap), improving retrieval precision for both API-level and implementation-level queries.
### Indexed docs
For each strategy, we created a different Elasticsearch Index with their own characteristics. The 3 first approaches have 33 chunks (1 chunk per file), whereas the last approach has 89 chunks.
### How can we evaluate each strategy?
**Evaluation Protocol:**
1. **Golden Dataset**
- Generate a set of natural language queries paired with their ground-truth context (filename).
- Each query should be answerable by examining one or more code samples.
- Example: Query="How do you handle errors in AVAP?" → Context="try_catch_request.avap"
2. **Test Each Strategy**
- For each of the 4 chunking strategies, run the same set of queries against the respective Elasticsearch index.
- Record the top-10 retrieved chunks for each query.
3. **Metrics**
- `NDCG@10`: Normalized discounted cumulative gain at rank 10 (measures ranking quality).
- `Recall@10`: Fraction of relevant chunks retrieved in top 10.
- `MRR@10`: Mean reciprocal rank (position of first relevant result).
4. **Relevance Judgment**
- A chunk is considered relevant if it contains code directly answering the query.
- For file-level strategies: entire file is relevant or irrelevant.
- For grammar-definition: specific block/statement chunks are relevant even if the full file is not.
5. **Acceptance Criteria**
- **Grammar definition must achieve at least a 10% improvement in NDCG@10 over file-level baseline.**
- **Recall@10 must not drop by more than 5 absolute percentage points vs file-level.**
- **Index size increase must remain below 50% of baseline.**
## Decision
## Rationale
## Consequences

View File

@ -45,16 +45,7 @@ Both `AskAgent` and `AskAgentStream` return a **server-side stream** of `AgentRe
**Use case:** Clients that do not support streaming or need a single atomic response.
**Request:**
```protobuf
message AgentRequest {
string query = 1; // The user's question. Required. Max recommended: 4096 chars.
string session_id = 2; // Conversation session identifier. Optional.
// If empty, defaults to "default" (shared session).
// Use a UUID per user/conversation for isolation.
}
```
**Request:** See [`AgentRequest`](#agentrequest) in §3.
**Response stream:**
@ -70,7 +61,7 @@ message AgentRequest {
**Behaviour:** Runs `prepare_graph` (classify → reformulate → retrieve), then calls `llm.stream()` directly. Emits one `AgentResponse` per token from Ollama, followed by a terminal message.
**Use case:** Interactive clients (chat UIs, terminal tools) that need progressive rendering.
**Use case:** Interactive clients (chat UIs, VS Code extension) that need progressive rendering.
**Request:** Same `AgentRequest` as `AskAgent`.
@ -152,10 +143,40 @@ message QuestionDetail {
### `AgentRequest`
| Field | Type | Required | Description |
|---|---|---|---|
| `query` | `string` | Yes | User's natural language question |
| `session_id` | `string` | No | Conversation identifier for multi-turn context. Use a stable UUID per user session. |
```protobuf
message AgentRequest {
string query = 1;
string session_id = 2;
string editor_content = 3;
string selected_text = 4;
string extra_context = 5;
string user_info = 6;
}
```
| Field | Type | Required | Encoding | Description |
|---|---|---|---|---|
| `query` | `string` | Yes | Plain text | User's natural language question. Max recommended: 4096 chars. |
| `session_id` | `string` | No | Plain text | Conversation identifier for multi-turn context. Use a stable UUID per user session. Defaults to `"default"` if empty. |
| `editor_content` | `string` | No | Base64 | Full content of the active file open in the editor at query time. Decoded server-side before entering the graph. |
| `selected_text` | `string` | No | Base64 | Text currently selected in the editor. Primary anchor for query reformulation and generation when the classifier detects an explicit editor reference. |
| `extra_context` | `string` | No | Base64 | Free-form additional context (e.g. file path, language identifier, open diagnostic errors). |
| `user_info` | `string` | No | JSON string | Client identity metadata. Expected format: `{"dev_id": <int>, "project_id": <int>, "org_id": <int>}`. Available in graph state for future routing or personalisation — not yet consumed by the graph. |
**Editor context behaviour:**
Fields 36 are all optional. If none are provided the assistant behaves exactly as without them — full backward compatibility. When `editor_content` or `selected_text` are provided, the graph classifier determines whether the user's question explicitly refers to that code. Only if the classifier returns `EDITOR` are the context fields injected into the generation prompt. This prevents the model from referencing editor code when the question is unrelated to it.
**Base64 encoding:**
`editor_content`, `selected_text` and `extra_context` must be Base64-encoded before sending. The server decodes them with UTF-8. Malformed Base64 is silently treated as empty string — no error is raised.
```python
import base64
encoded = base64.b64encode(content.encode("utf-8")).decode("utf-8")
```
---
### `AgentResponse`
@ -165,6 +186,8 @@ message QuestionDetail {
| `avap_code` | `string` | Currently always `"AVAP-2026"` in non-streaming mode, empty in streaming |
| `is_final` | `bool` | `true` only on the last message of the stream |
---
### `EvalRequest`
| Field | Type | Required | Default | Description |
@ -173,6 +196,8 @@ message QuestionDetail {
| `limit` | `int32` | No | `0` (all) | Max questions to evaluate |
| `index` | `string` | No | `$ELASTICSEARCH_INDEX` | ES index to evaluate |
---
### `EvalResponse`
See full definition in [§2.3](#23-evaluaterag).
@ -192,7 +217,7 @@ The engine catches all exceptions and returns them as terminal `AgentResponse` m
{"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true}
```
**`EvaluateRAG` error response:**
**`EvaluateRAG` error response:**
Returned as a single `EvalResponse` with `status` set to the error description:
```json
{"status": "ANTHROPIC_API_KEY no configurada en .env", ...}
@ -211,11 +236,11 @@ grpcurl -plaintext localhost:50052 list
grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
```
### `AskAgent`full response
### `AskAgent`basic query
```bash
grpcurl -plaintext \
-d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
-d '{"query": "Que significa AVAP?", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgent
```
@ -223,12 +248,47 @@ grpcurl -plaintext \
Expected response:
```json
{
"text": "addVar is an AVAP command that declares a new variable...",
"text": "AVAP (Advanced Virtual API Programming) es un DSL Turing Completo...",
"avap_code": "AVAP-2026",
"is_final": true
}
```
### `AskAgent` — with editor context
```python
import base64, json, grpc
import brunix_pb2, brunix_pb2_grpc
def encode(text: str) -> str:
return base64.b64encode(text.encode("utf-8")).decode("utf-8")
channel = grpc.insecure_channel("localhost:50052")
stub = brunix_pb2_grpc.AssistanceEngineStub(channel)
editor_code = """
try()
ormDirect("UPDATE users SET active=1", res)
exception(e)
addVar(_status, 500)
addResult("Error")
end()
"""
request = brunix_pb2.AgentRequest(
query = "why is this not catching the error?",
session_id = "dev-001",
editor_content = encode(editor_code),
selected_text = encode(editor_code), # same block selected
extra_context = encode("file: handler.avap"),
user_info = json.dumps({"dev_id": 1, "project_id": 2, "org_id": 3}),
)
for response in stub.AskAgent(request):
if response.is_final:
print(response.text)
```
### `AskAgentStream` — token streaming
```bash
@ -250,7 +310,6 @@ Expected response (truncated):
### `EvaluateRAG` — run evaluation
```bash
# Evaluate first 10 questions from the "core_syntax" category
grpcurl -plaintext \
-d '{"category": "core_syntax", "limit": 10}' \
localhost:50052 \
@ -264,7 +323,7 @@ Expected response:
"questions_evaluated": 10,
"elapsed_seconds": 142.3,
"judge_model": "claude-sonnet-4-20250514",
"index": "avap-docs-test",
"index": "avap-knowledge-v1",
"faithfulness": 0.8421,
"answer_relevancy": 0.7913,
"context_recall": 0.7234,
@ -275,7 +334,7 @@ Expected response:
}
```
### Multi-turn conversation example
### Multi-turn conversation
```bash
# Turn 1
@ -283,7 +342,7 @@ grpcurl -plaintext \
-d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
# Turn 2 — the engine has history from Turn 1
# Turn 2 — engine has history from Turn 1
grpcurl -plaintext \
-d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
@ -303,37 +362,90 @@ python -m grpc_tools.protoc \
## 6. OpenAI-Compatible Proxy
The container also exposes an HTTP server on port `8000` (`openai_proxy.py`) that wraps `AskAgentStream` under an OpenAI-compatible endpoint. This allows integration with any tool that supports the OpenAI Chat Completions API.
The container also exposes an HTTP server on port `8000` (`openai_proxy.py`) that wraps the gRPC interface under an OpenAI-compatible API. This allows integration with any tool that supports the OpenAI Chat Completions API`continue.dev`, LiteLLM, Open WebUI, or any custom client.
**Base URL:** `http://localhost:8000`
### Available endpoints
| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/v1/chat/completions` | OpenAI Chat Completions. Routes to `AskAgent` or `AskAgentStream`. |
| `POST` | `/v1/completions` | OpenAI Completions (legacy). |
| `GET` | `/v1/models` | Lists available models. Returns `brunix`. |
| `POST` | `/api/chat` | Ollama chat format (NDJSON streaming). |
| `POST` | `/api/generate` | Ollama generate format (NDJSON streaming). |
| `GET` | `/api/tags` | Ollama model list. |
| `GET` | `/health` | Health check. Returns `{"status": "ok"}`. |
### `POST /v1/chat/completions`
**Routing:** `stream: false``AskAgent` (single response). `stream: true``AskAgentStream` (SSE token stream).
**Request body:**
```json
{
"model": "brunix",
"messages": [
{"role": "user", "content": "What is addVar in AVAP?"}
{"role": "user", "content": "Que significa AVAP?"}
],
"stream": true
"stream": false,
"session_id": "uuid-per-conversation",
"user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
}
```
**Notes:**
- The `model` field is ignored; the engine always uses the configured `OLLAMA_MODEL_NAME`.
- Session management is handled internally by the proxy. Conversation continuity across separate HTTP requests is not guaranteed.
- Only `stream: true` is fully supported. Non-streaming mode may be available but is not the primary use case.
**The `user` field (editor context transport):**
**Example with curl:**
The standard OpenAI `user` field is used to transport editor context as a JSON string. This allows the VS Code extension to send context without requiring API changes. Non-Brunix clients can omit `user` or set it to a plain string — both are handled gracefully.
| Key in `user` JSON | Encoding | Description |
|---|---|---|
| `editor_content` | Base64 | Full content of the active editor file |
| `selected_text` | Base64 | Currently selected text in the editor |
| `extra_context` | Base64 | Free-form additional context |
| `user_info` | JSON object | `{"dev_id": int, "project_id": int, "org_id": int}` |
**Important:** `session_id` must be sent as a top-level field — never inside the `user` JSON. The proxy reads `session_id` exclusively from the dedicated field.
**Example — general query (no editor context):**
```bash
curl http://localhost:8000/v1/chat/completions \
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "Explain AVAP loops"}],
"stream": true
"messages": [{"role": "user", "content": "Que significa AVAP?"}],
"stream": false,
"session_id": "test-001"
}'
```
**Example — query with editor context (VS Code extension):**
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "que hace este codigo?"}],
"stream": true,
"session_id": "test-001",
"user": "{\"editor_content\":\"<base64>\",\"selected_text\":\"<base64>\",\"extra_context\":\"\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
}'
```
**Example — empty editor context fields:**
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "como funciona addVar?"}],
"stream": false,
"session_id": "test-002",
"user": "{\"editor_content\":\"\",\"selected_text\":\"\",\"extra_context\":\"\",\"user_info\":{}}"
}'
```

View File

@ -1,8 +1,8 @@
# Brunix Assistance Engine — Architecture Reference
> **Audience:** Engineers contributing to this repository, architects reviewing the system design, and operators responsible for its deployment.
> **Last updated:** 2026-03-18
> **Version:** 1.5.x
> **Last updated:** 2026-03-20
> **Version:** 1.6.x
---
@ -13,14 +13,15 @@
3. [Request Lifecycle](#3-request-lifecycle)
4. [LangGraph Workflow](#4-langgraph-workflow)
5. [RAG Pipeline — Hybrid Search](#5-rag-pipeline--hybrid-search)
6. [Streaming Architecture (AskAgentStream)](#6-streaming-architecture-askagentstream)
7. [Evaluation Pipeline (EvaluateRAG)](#7-evaluation-pipeline-evaluaterag)
8. [Data Ingestion Pipeline](#8-data-ingestion-pipeline)
9. [Infrastructure Layout](#9-infrastructure-layout)
10. [Session State & Conversation Memory](#10-session-state--conversation-memory)
11. [Observability Stack](#11-observability-stack)
12. [Security Boundaries](#12-security-boundaries)
13. [Known Limitations & Future Work](#13-known-limitations--future-work)
6. [Editor Context Pipeline](#6-editor-context-pipeline)
7. [Streaming Architecture (AskAgentStream)](#7-streaming-architecture-askagentstream)
8. [Evaluation Pipeline (EvaluateRAG)](#8-evaluation-pipeline-evaluaterag)
9. [Data Ingestion Pipeline](#9-data-ingestion-pipeline)
10. [Infrastructure Layout](#10-infrastructure-layout)
11. [Session State & Conversation Memory](#11-session-state--conversation-memory)
12. [Observability Stack](#12-observability-stack)
13. [Security Boundaries](#13-security-boundaries)
14. [Known Limitations & Future Work](#14-known-limitations--future-work)
---
@ -33,6 +34,7 @@ The **Brunix Assistance Engine** is a stateful, streaming-capable AI service tha
- **Hybrid RAG** (BM25 + kNN with RRF fusion) over an Elasticsearch vector index
- **Ollama** as the local LLM and embedding backend
- **RAGAS + Claude** as the automated evaluation judge
- **Editor context injection** — the VS Code extension can send active file content and selected code alongside each query; the engine decides whether to use it based on the user's intent
A secondary **OpenAI-compatible HTTP proxy** (port `8000`) is served via FastAPI/Uvicorn, enabling integration with tools that expect the OpenAI API format.
@ -40,6 +42,7 @@ A secondary **OpenAI-compatible HTTP proxy** (port `8000`) is served via FastAPI
┌─────────────────────────────────────────────────────────────┐
│ External Clients │
│ grpcurl / App SDK │ OpenAI-compatible client │
│ VS Code extension │ (continue.dev, LiteLLM) │
└────────────┬────────────────┴──────────────┬────────────────┘
│ gRPC :50052 │ HTTP :8000
▼ ▼
@ -74,19 +77,19 @@ A secondary **OpenAI-compatible HTTP proxy** (port `8000`) is served via FastAPI
| Component | File / Service | Responsibility |
|---|---|---|
| **gRPC Server** | `Docker/src/server.py` | Entry point. Implements the `AssistanceEngine` servicer. Initializes LLM, embeddings, ES client, and both graphs. |
| **gRPC Server** | `Docker/src/server.py` | Entry point. Implements the `AssistanceEngine` servicer. Initializes LLM, embeddings, ES client, and both graphs. Decodes Base64 editor context fields from incoming requests. |
| **Full Graph** | `Docker/src/graph.py``build_graph()` | Complete workflow: classify → reformulate → retrieve → generate. Used by `AskAgent` and `EvaluateRAG`. |
| **Prepare Graph** | `Docker/src/graph.py``build_prepare_graph()` | Partial workflow: classify → reformulate → retrieve. Does **not** call the LLM for generation. Used by `AskAgentStream` to enable manual token streaming. |
| **Message Builder** | `Docker/src/graph.py``build_final_messages()` | Reconstructs the final prompt list from prepared state for `llm.stream()`. |
| **Message Builder** | `Docker/src/graph.py``build_final_messages()` | Reconstructs the final prompt list from prepared state for `llm.stream()`. Injects editor context when `use_editor_context` is `True`. |
| **Prompt Library** | `Docker/src/prompts.py` | Centralized definitions for `CLASSIFY`, `REFORMULATE`, `GENERATE`, `CODE_GENERATION`, and `CONVERSATIONAL` prompts. |
| **Agent State** | `Docker/src/state.py` | `AgentState` TypedDict shared across all graph nodes. |
| **Agent State** | `Docker/src/state.py` | `AgentState` TypedDict shared across all graph nodes. Includes editor context fields and `use_editor_context` flag. |
| **Evaluation Suite** | `Docker/src/evaluate.py` | RAGAS-based pipeline. Uses the production retriever + Ollama LLM for generation, and Claude as the impartial judge. |
| **OpenAI Proxy** | `Docker/src/openai_proxy.py` | FastAPI application that wraps `AskAgentStream` under an `/v1/chat/completions` endpoint. |
| **OpenAI Proxy** | `Docker/src/openai_proxy.py` | FastAPI application that wraps `AskAgent` / `AskAgentStream` under OpenAI and Ollama compatible endpoints. Parses editor context from the `user` field. |
| **LLM Factory** | `Docker/src/utils/llm_factory.py` | Provider-agnostic factory for chat models (Ollama, AWS Bedrock). |
| **Embedding Factory** | `Docker/src/utils/emb_factory.py` | Provider-agnostic factory for embedding models (Ollama, HuggingFace). |
| **Ingestion Pipeline** | `scripts/pipelines/flows/elasticsearch_ingestion.py` | Chunks and ingests AVAP documents into Elasticsearch with embeddings. |
| **Dataset Generator** | `scripts/pipelines/flows/generate_mbap.py` | Generates synthetic MBPP-style AVAP problems using Claude. |
| **MBPP Translator** | `scripts/pipelines/flows/translate_mbpp.py` | Translates MBPP Python dataset into AVAP equivalents. |
| **AVAP Chunker** | `scripts/pipelines/ingestion/avap_chunker.py` | Semantic chunker for `.avap` source files using `avap_config.json` as grammar. |
| **Unit Tests** | `Docker/tests/test_prd_0002.py` | 40 unit tests covering editor context parsing, Base64 decoding, classifier output, reformulate anchor, and injection logic. |
---
@ -95,16 +98,21 @@ A secondary **OpenAI-compatible HTTP proxy** (port `8000`) is served via FastAPI
### 3.1 `AskAgent` (non-streaming)
```
Client → gRPC AgentRequest{query, session_id}
Client → gRPC AgentRequest{query, session_id, editor_content*, selected_text*, extra_context*, user_info*}
│ (* Base64-encoded; user_info is JSON string)
├─ Decode Base64 fields (editor_content, selected_text, extra_context)
├─ Load conversation history from session_store[session_id]
├─ Build initial_state = {messages: history + [user_msg], ...}
├─ Build initial_state = {messages, session_id, editor_content, selected_text, extra_context, user_info}
└─ graph.invoke(initial_state)
├─ classify → query_type ∈ {RETRIEVAL, CODE_GENERATION, CONVERSATIONAL}
├─ reformulate → reformulated_query (keyword-optimized for semantic search)
├─ retrieve → context (top-8 hybrid RRF chunks from Elasticsearch)
└─ generate → final AIMessage (llm.invoke)
│ use_editor_context ∈ {True, False}
├─ reformulate → reformulated_query
│ (anchored to selected_text if use_editor_context=True)
├─ retrieve → context (top-8 hybrid RRF chunks)
└─ generate → final AIMessage
(editor context injected only if use_editor_context=True)
├─ Persist updated history to session_store[session_id]
└─ yield AgentResponse{text, avap_code="AVAP-2026", is_final=True}
@ -113,17 +121,18 @@ Client → gRPC AgentRequest{query, session_id}
### 3.2 `AskAgentStream` (token streaming)
```
Client → gRPC AgentRequest{query, session_id}
Client → gRPC AgentRequest{query, session_id, editor_content*, selected_text*, extra_context*, user_info*}
├─ Decode Base64 fields
├─ Load history from session_store[session_id]
├─ Build initial_state
├─ prepare_graph.invoke(initial_state) ← Phase 1: no LLM generation
│ ├─ classify
├─ prepare_graph.invoke(initial_state) ← Phase 1: no LLM generation
│ ├─ classify → query_type + use_editor_context
│ ├─ reformulate
│ └─ retrieve (or skip_retrieve if CONVERSATIONAL)
├─ build_final_messages(prepared_state) ← Reconstruct prompt list
├─ build_final_messages(prepared_state) ← Reconstruct prompt with editor context if flagged
└─ for chunk in llm.stream(final_messages):
└─ yield AgentResponse{text=token, is_final=False}
@ -132,7 +141,20 @@ Client → gRPC AgentRequest{query, session_id}
└─ yield AgentResponse{text="", is_final=True}
```
### 3.3 `EvaluateRAG`
### 3.3 HTTP Proxy → gRPC
```
Client → POST /v1/chat/completions {messages, stream, session_id, user}
├─ Extract query from last user message in messages[]
├─ Read session_id from dedicated field (NOT from user)
├─ Parse user field as JSON → {editor_content, selected_text, extra_context, user_info}
├─ stream=false → _invoke_blocking() → AskAgent gRPC call
└─ stream=true → _iter_stream() → AskAgentStream gRPC call → SSE token stream
```
### 3.4 `EvaluateRAG`
```
Client → gRPC EvalRequest{category?, limit?, index?}
@ -144,9 +166,8 @@ Client → gRPC EvalRequest{category?, limit?, index?}
│ ├─ retrieve_context (hybrid BM25+kNN, same as production)
│ └─ generate_answer (Ollama LLM + GENERATE_PROMPT)
├─ Build RAGAS Dataset
├─ Run RAGAS metrics with Claude as judge:
│ faithfulness / answer_relevancy / context_recall / context_precision
└─ Compute global_score + verdict (EXCELLENT / ACCEPTABLE / INSUFFICIENT)
├─ Run RAGAS metrics with Claude as judge
└─ Compute global_score + verdict
└─ return EvalResponse{scores, global_score, verdict, details[]}
```
@ -155,11 +176,28 @@ Client → gRPC EvalRequest{category?, limit?, index?}
## 4. LangGraph Workflow
### 4.1 Full Graph (`build_graph`)
### 4.1 Agent State
```python
class AgentState(TypedDict):
messages: Annotated[list, add_messages] # conversation history
session_id: str
query_type: str # RETRIEVAL | CODE_GENERATION | CONVERSATIONAL
reformulated_query: str
context: str # formatted RAG context string
editor_content: str # decoded from Base64
selected_text: str # decoded from Base64
extra_context: str # decoded from Base64
user_info: str # JSON string: {"dev_id", "project_id", "org_id"}
use_editor_context: bool # set by classifier — True only if query explicitly refers to editor
```
### 4.2 Full Graph (`build_graph`)
```
┌─────────────┐
│ classify │
│ classify │ ← sees: query + history + selected_text (if present)
│ │ outputs: query_type + use_editor_context
└──────┬──────┘
┌────────────────┼──────────────────┐
@ -170,8 +208,12 @@ Client → gRPC EvalRequest{category?, limit?, index?}
▼ ▼
┌──────────────┐ ┌────────────────────────┐
│ reformulate │ │ respond_conversational │
└──────┬───────┘ └───────────┬────────────┘
▼ │
│ │ └───────────┬────────────┘
│ if use_editor│ │
│ anchor query │ │
│ to selected │ │
└──────┬───────┘ │
▼ │
┌──────────────┐ │
│ retrieve │ │
└──────┬───────┘ │
@ -180,24 +222,54 @@ Client → gRPC EvalRequest{category?, limit?, index?}
▼ ▼ │
┌──────────┐ ┌───────────────┐ │
│ generate │ │ generate_code │ │
└────┬─────┘ └───────┬───────┘ │
│ │ │ │ │
│ injects │ │ injects editor│ │
│ editor │ │ context only │ │
│ context │ │ if flag=True │ │
│ if flag │ └───────┬───────┘ │
└────┬─────┘ │ │
│ │ │
└────────────────────┴────────────────┘
END
```
### 4.2 Prepare Graph (`build_prepare_graph`)
### 4.3 Prepare Graph (`build_prepare_graph`)
Identical routing for classify, but generation nodes are replaced by `END`. The `CONVERSATIONAL` branch uses `skip_retrieve` (returns empty context without querying Elasticsearch).
Identical routing for classify, but generation nodes are replaced by `END`. The `CONVERSATIONAL` branch uses `skip_retrieve` (returns empty context). The `use_editor_context` flag is set here and carried forward into `build_final_messages`.
### 4.3 Query Type Routing
### 4.4 Classifier — Two-Token Output
| `query_type` | Triggers retrieve? | Generation prompt |
|---|---|---|
| `RETRIEVAL` | Yes | `GENERATE_PROMPT` (explanation-focused) |
| `CODE_GENERATION` | Yes | `CODE_GENERATION_PROMPT` (code-focused, returns AVAP blocks) |
| `CONVERSATIONAL` | No | `CONVERSATIONAL_PROMPT` (reformulation of prior answer) |
The classifier outputs exactly two tokens separated by a space:
```
<query_type> <editor_signal>
Examples:
RETRIEVAL NO_EDITOR
CODE_GENERATION EDITOR
CONVERSATIONAL NO_EDITOR
```
`EDITOR` is set only when the user message explicitly refers to editor code using expressions like "this code", "este codigo", "fix this", "que hace esto", "explain this", etc. General AVAP questions, code generation requests, and conversational follow-ups always return `NO_EDITOR`.
### 4.5 Query Type Routing
| `query_type` | Triggers retrieve? | Generation prompt | Editor context injected? |
|---|---|---|---|
| `RETRIEVAL` | Yes | `GENERATE_PROMPT` | Only if `use_editor_context=True` |
| `CODE_GENERATION` | Yes | `CODE_GENERATION_PROMPT` | Only if `use_editor_context=True` |
| `CONVERSATIONAL` | No | `CONVERSATIONAL_PROMPT` | Never |
### 4.6 Reformulator — Mode-Aware & Language-Preserving
The reformulator receives `[MODE: <query_type>]` prepended to the query:
- **MODE RETRIEVAL:** Compresses the query into compact keywords. Does NOT expand with AVAP commands. Preserves original language — Spanish queries stay in Spanish, English queries stay in English.
- **MODE CODE_GENERATION:** Applies the AVAP command expansion mapping (registerEndpoint, addParam, ormAccessSelect, etc.).
- **MODE CONVERSATIONAL:** Returns the query as-is.
Language preservation is critical for BM25 retrieval — the AVAP LRM is written in Spanish, so a Spanish query must reach the retriever in Spanish for lexical matching to work correctly.
---
@ -206,12 +278,14 @@ Identical routing for classify, but generation nodes are replaced by `END`. The
The retrieval system (`hybrid_search_native`) fuses BM25 lexical search and kNN dense vector search using **Reciprocal Rank Fusion (RRF)**.
```
User query
User query (reformulated, language-preserved)
├─ embeddings.embed_query(query) → query_vector [768-dim]
├─ embeddings.embed_query(query) → query_vector [1024-dim]
├─ ES multi_match (BM25) on fields [content^2, text^2]
│ └─ top-k BM25 hits
├─ ES bool query:
│ ├─ must: multi_match (BM25) on [content^2, text^2]
│ └─ should: boost spec/narrative doc_types (2.0x / 1.5x)
│ └─ top-k BM25 hits
└─ ES knn on field [embedding], num_candidates = k×5
└─ top-k kNN hits
@ -221,7 +295,9 @@ User query
└─ Top-8 documents → format_context() → context string
```
**RRF constant:** `60` (standard value; prevents high-rank documents from dominating while still rewarding consensus between both retrieval modes).
**RRF constant:** `60` (standard value).
**doc_type boost:** `spec` and `narrative` chunks receive a score boost in the BM25 query to prioritize definitional and explanatory content over raw code examples when the query is about meaning or documentation.
**Chunk metadata** attached to each retrieved document:
@ -229,36 +305,96 @@ User query
|---|---|
| `chunk_id` | Unique identifier within the index |
| `source_file` | Origin document filename |
| `doc_type` | `prose`, `code`, `code_example`, `bnf` |
| `block_type` | AVAP block type: `function`, `if`, `startLoop`, `try` |
| `doc_type` | `spec`, `code`, `code_example`, `bnf` |
| `block_type` | AVAP block type: `narrative`, `function`, `if`, `startLoop`, `try`, etc. |
| `section` | Document section/chapter heading |
Documents of type `code`, `code_example`, `bnf`, or block type `function / if / startLoop / try` are tagged as `[AVAP CODE]` in the formatted context, signaling the LLM to treat them as executable syntax rather than prose.
---
## 6. Streaming Architecture (AskAgentStream)
## 6. Editor Context Pipeline
The editor context pipeline (PRD-0002) allows the VS Code extension to send the user's active editor state alongside every query. The engine uses this context only when the user explicitly refers to their code.
### Transport
Editor context travels differently depending on the client protocol:
**Via gRPC directly (`AgentRequest` fields 36):**
- `editor_content` (field 3) — Base64-encoded full file content
- `selected_text` (field 4) — Base64-encoded selected text
- `extra_context` (field 5) — Base64-encoded free-form context
- `user_info` (field 6) — JSON string `{"dev_id":…,"project_id":…,"org_id":…}`
**Via HTTP proxy (OpenAI `/v1/chat/completions`):**
- Transported in the standard `user` field as a JSON string
- Same four keys, same encodings
- The proxy parses, extracts, and forwards to gRPC
### Pipeline
```
AgentRequest arrives
├─ server.py: Base64 decode editor_content, selected_text, extra_context
├─ user_info passed as-is (JSON string)
└─ initial_state populated with all four fields
classify node:
├─ If selected_text present → injected into classify prompt as <editor_selection>
├─ LLM outputs: RETRIEVAL EDITOR or RETRIEVAL NO_EDITOR (etc.)
└─ use_editor_context = True if second token == EDITOR
reformulate node:
├─ If use_editor_context=True AND selected_text present:
│ anchor = selected_text + "\n\nUser question: " + query
│ → LLM reformulates using selected code as primary signal
└─ Else: reformulate query as normal
retrieve node: (unchanged — uses reformulated_query)
generate / generate_code node:
├─ If use_editor_context=True:
│ prompt = <selected_code> + <editor_file> + <extra_context> + RAG_prompt
│ Priority: selected_text > editor_content > RAG context > extra_context
└─ Else: standard RAG prompt — no editor content injected
```
### Intent detection examples
| User message | `use_editor_context` | Reason |
|---|---|---|
| "Que significa AVAP?" | `False` | General definition question |
| "dame un API de hello world" | `False` | Code generation, no editor reference |
| "que hace este codigo?" | `True` | Explicit reference to "this code" |
| "fix this" | `True` | Explicit reference to current selection |
| "como mejoro esto?" | `True` | Explicit reference to current context |
| "how does addVar work?" | `False` | Documentation question, no editor reference |
---
## 7. Streaming Architecture (AskAgentStream)
The two-phase streaming design is critical to understand:
**Why not stream through LangGraph?**
**Why not stream through LangGraph?**
LangGraph's `stream()` method yields full state snapshots per node, not individual tokens. To achieve true per-token streaming to the gRPC client, the generation step is deliberately extracted from the graph and called directly via `llm.stream()`.
**Phase 1 — Deterministic preparation (graph-managed):**
- Classification, query reformulation, and retrieval run through `prepare_graph.invoke()`.
- This phase runs synchronously and produces the complete context before any token is emitted to the client.
Classification, query reformulation, and retrieval run through `prepare_graph.invoke()`. This phase runs synchronously and produces the complete context before any token is emitted to the client. Editor context classification also happens here — `use_editor_context` is set in the prepared state.
**Phase 2 — Token streaming (manual):**
- `build_final_messages()` reconstructs the exact prompt that `generate` / `generate_code` / `respond_conversational` would have used.
- `llm.stream(final_messages)` yields one `AIMessageChunk` per token from Ollama.
- Each token is immediately forwarded to the gRPC client as `AgentResponse{text=token, is_final=False}`.
- After the stream ends, the full assembled text is persisted to `session_store`.
`build_final_messages()` reconstructs the exact prompt, injecting editor context if `use_editor_context` is `True`. `llm.stream(final_messages)` yields one `AIMessageChunk` per token from Ollama. Each token is immediately forwarded as `AgentResponse{text=token, is_final=False}`.
**Backpressure:** gRPC streaming is flow-controlled by the client. If the client stops reading, the Ollama token stream will block at the `yield` point. No explicit buffer overflow protection is implemented (acceptable for the current single-client dev mode).
**Backpressure:** gRPC streaming is flow-controlled by the client. If the client stops reading, the Ollama token stream will block at the `yield` point.
---
## 7. Evaluation Pipeline (EvaluateRAG)
## 8. Evaluation Pipeline (EvaluateRAG)
The evaluation suite implements an **offline RAG evaluation** pattern using RAGAS metrics.
@ -288,7 +424,7 @@ verdict:
### Golden dataset
Located at `Docker/src/golden_dataset.json`. Each entry follows this schema:
Located at `Docker/src/golden_dataset.json`. Each entry:
```json
{
@ -299,9 +435,11 @@ Located at `Docker/src/golden_dataset.json`. Each entry follows this schema:
}
```
> **Note:** The golden dataset does not include editor-context queries. EvaluateRAG measures the RAG pipeline in isolation. A separate editor-context golden dataset is planned as future work once the VS Code extension is validated.
---
## 8. Data Ingestion Pipeline
## 9. Data Ingestion Pipeline
Documents flow into the Elasticsearch index through two paths:
@ -317,31 +455,29 @@ scripts/pipelines/flows/elasticsearch_ingestion.py
├─ Load markdown files
├─ Chunk using scripts/pipelines/tasks/chunk.py
│ (semantic chunking via Chonkie library)
├─ Generate embeddings via scripts/pipelines/tasks/embeddings.py
│ (Ollama or HuggingFace embedding model)
└─ Bulk index into Elasticsearch
index: avap-docs-* (configurable via ELASTICSEARCH_INDEX)
mapping: {content, embedding, source_file, doc_type, section, ...}
```
### Path B — Synthetic AVAP code samples
### Path B — AVAP native code chunker
```
docs/samples/*.avap
scripts/pipelines/flows/generate_mbap.py
scripts/pipelines/ingestion/avap_chunker.py
│ (grammar: scripts/pipelines/ingestion/avap_config.json v2.0)
├─ Read AVAP LRM (docs/LRM/avap.md)
├─ Call Claude API to generate MBPP-style problems
└─ Output synthetic_datasets/mbpp_avap.json
(used for fine-tuning and few-shot examples)
├─ Lexer strips comments and string contents
├─ Block detection (function, if, startLoop, try)
├─ Statement classification (30 types + catch-all)
├─ Semantic tag assignment (18 boolean tags)
└─ Output: JSONL chunks → avap_ingestor.py → Elasticsearch
```
---
## 9. Infrastructure Layout
## 10. Infrastructure Layout
### Devaron Cluster (Vultr Kubernetes)
@ -352,22 +488,6 @@ scripts/pipelines/flows/generate_mbap.py
| Observability DB | `brunix-postgres` | `5432` | PostgreSQL for Langfuse |
| Langfuse UI | — | `80` | `http://45.77.119.180` |
### Kubernetes tunnel commands
```bash
# Terminal 1 — LLM
kubectl port-forward --address 0.0.0.0 svc/ollama-light-service 11434:11434 \
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
# Terminal 2 — Elasticsearch
kubectl port-forward --address 0.0.0.0 svc/brunix-vector-db 9200:9200 \
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
# Terminal 3 — PostgreSQL (Langfuse)
kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 \
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
```
### Port map summary
| Port | Protocol | Service | Scope |
@ -381,7 +501,7 @@ kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 \
---
## 10. Session State & Conversation Memory
## 11. Session State & Conversation Memory
Conversation history is managed via an in-process dictionary:
@ -395,69 +515,63 @@ session_store: dict[str, list] = defaultdict(list)
- **In-memory only.** History is lost on container restart.
- **No TTL or eviction.** Sessions grow unbounded for the lifetime of the process.
- **Thread safety:** Python's GIL provides basic safety for the `ThreadPoolExecutor(max_workers=10)` gRPC server, but concurrent writes to the same `session_id` from two simultaneous requests are not explicitly protected.
- **History window:** `format_history_for_classify()` uses only the last 6 messages for query classification to keep the classify prompt short and deterministic.
- **History window:** `format_history_for_classify()` uses only the last 6 messages for query classification.
> **Future work:** Replace `session_store` with a Redis-backed persistent store to survive restarts and support horizontal scaling.
---
## 11. Observability Stack
## 12. Observability Stack
### Langfuse tracing
The server integrates Langfuse for end-to-end LLM tracing. Every `AskAgent` / `AskAgentStream` request creates a trace that captures:
- Input query and session ID
- Each LangGraph node execution (classify, reformulate, retrieve, generate)
- LLM token counts, latency, and cost
- Final response
Every `AskAgent` / `AskAgentStream` request creates a trace capturing input query, session ID, each LangGraph node execution, LLM token counts, latency, and final response.
**Access:** `http://45.77.119.180` — requires a project API key configured via `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY`.
**Access:** `http://45.77.119.180`
### Logging
Structured logging via Python's `logging` module, configured at `INFO` level. Log format:
```
[MODULE] context_info — key=value key=value
```
Key log markers:
| Marker | Module | Meaning |
|---|---|---|
| `[ESEARCH]` | `server.py` | Elasticsearch connection status |
| `[classify]` | `graph.py` | Query type decision + raw LLM output |
| `[reformulate]` | `graph.py` | Reformulated query string |
| `[classify]` | `graph.py` | Query type + `use_editor_context` flag + raw LLM output |
| `[reformulate]` | `graph.py` | Reformulated query string + whether selected_text was used as anchor |
| `[hybrid]` | `graph.py` | BM25 / kNN hit counts and RRF result count |
| `[retrieve]` | `graph.py` | Number of docs retrieved and context length |
| `[generate]` | `graph.py` | Response character count |
| `[AskAgent]` | `server.py` | editor and selected flags, query preview |
| `[AskAgentStream]` | `server.py` | Token count and total chars per stream |
| `[eval]` | `evaluate.py` | Per-question retrieval and generation status |
| `[base64]` | `server.py` | Warning when a Base64 field fails to decode |
---
## 12. Security Boundaries
## 13. Security Boundaries
| Boundary | Current state | Risk |
|---|---|---|
| gRPC transport | **Insecure** (`add_insecure_port`) | Network interception possible. Acceptable in dev/tunnel setup; requires mTLS for production. |
| Elasticsearch auth | Optional (user/pass or API key via env vars) | Index is accessible without auth if `ELASTICSEARCH_USER` and `ELASTICSEARCH_API_KEY` are unset. |
| Elasticsearch auth | Optional (user/pass or API key via env vars) | Index is accessible without auth if vars are unset. |
| Editor context | Transmitted in plaintext (Base64 is encoding, not encryption) | File contents visible to anyone intercepting gRPC traffic. Requires TLS for production. |
| Container user | Non-root (`python:3.11-slim` default) | Low risk. Do not override with `root`. |
| Secrets in env | Via `.env` / `docker-compose` env injection | Never commit real values. See [CONTRIBUTING.md](../CONTRIBUTING.md#6-environment-variables-policy). |
| Session store | In-memory, no auth | Any caller with access to the gRPC port can read/write any session by guessing its ID. |
| Kubeconfig | `./kubernetes/kubeconfig.yaml` (local only) | Grants cluster access. Never commit. Listed in `.gitignore`. |
| Secrets in env | Via `.env` / `docker-compose` env injection | Never commit real values. |
| Session store | In-memory, no auth | Any caller with gRPC access can read/write any session by guessing its ID. |
| `user_info` | JSON string, no validation | `dev_id`, `project_id`, `org_id` are not authenticated — passed as metadata only. |
---
## 13. Known Limitations & Future Work
## 14. Known Limitations & Future Work
| Area | Limitation | Proposed solution |
|---|---|---|
| Session persistence | In-memory, lost on restart | Redis-backed `session_store` |
| Horizontal scaling | `session_store` is per-process | Sticky sessions or external session store |
| gRPC security | Insecure port | Add TLS + optional mTLS |
| Editor context security | Base64 is not encryption | TLS required before sending real file contents |
| `user_info` auth | Not validated or authenticated | JWT or API key validation on `user_info` fields |
| Elasticsearch auth | Not enforced if vars unset | Make auth required; fail-fast on startup |
| Context window | Full history passed to generate; no truncation | Sliding window or summarization for long sessions |
| Evaluation | Golden dataset must be manually maintained | Automated golden dataset refresh pipeline |
| Evaluation | Golden dataset has no editor-context queries | Build dedicated editor-context golden dataset after VS Code validation |
| Rate limiting | None on gRPC server | Add interceptor-based rate limiter |
| Health check | No gRPC health protocol | Implement `grpc.health.v1` |

228
docs/BNF/avap.lark Normal file
View File

@ -0,0 +1,228 @@
start: program
program: separator* line_or_comment (separator+ line_or_comment)* separator*
?line_or_comment: simple_stmt comment?
| compound_stmt
| comment
| BLOCK_COMMENT
?separator: EOL+
comment: DOC_COMMENT | LINE_COMMENT
EOL: /\r?\n/
DOC_COMMENT.2: /\/\/\/[^\r\n]*/
LINE_COMMENT.1: /\/\/[^\r\n]*/
BLOCK_COMMENT: /\/\*[\s\S]*?\*\//
?simple_stmt: assignment
| return_stmt
| system_command
| io_command
| async_command
| connector_cmd
| db_command
| http_command
| util_command
| modularity_cmd
| call_stmt
?compound_stmt: function_decl
| if_stmt
| loop_stmt
| try_stmt
assignment: identifier "=" expression
call_stmt: identifier "(" argument_list? ")"
| identifier "=" identifier "." identifier "(" argument_list? ")"
| identifier "." identifier "(" argument_list? ")"
system_command: register_cmd
| addvar_cmd
register_cmd: "registerEndpoint" "(" stringliteral "," stringliteral "," list_display "," stringliteral "," identifier "," identifier ")"
addvar_cmd: "addVar" "(" addvar_arg "," addvar_arg ")"
addvar_arg: identifier
| literal
| "$" identifier
identifier: IDENTIFIER
system_variable: "_status"
io_command: addparam_cmd
| getlistlen_cmd
| addresult_cmd
| getparamlist_cmd
addparam_cmd: "addParam" "(" stringliteral "," identifier ")"
getlistlen_cmd: "getListLen" "(" identifier "," identifier ")"
getparamlist_cmd: "getQueryParamList" "(" stringliteral "," identifier ")"
addresult_cmd: "addResult" "(" identifier ")"
if_stmt: "if" "(" if_condition ")" separator block ("else" "(" ")" separator block)? "end" "(" ")"
if_condition: if_atom "," if_atom "," stringliteral
| "None" "," "None" "," stringliteral
if_atom: identifier
| literal
loop_stmt: "startLoop" "(" identifier "," expression "," expression ")" separator block "endLoop" "(" ")"
try_stmt: "try" "(" ")" separator block "exception" "(" identifier ")" separator block "end" "(" ")"
block: separator* line_or_comment (separator+ line_or_comment)* separator*
async_command: go_stmt
| gather_stmt
go_stmt: identifier "=" "go" identifier "(" argument_list? ")"
gather_stmt: identifier "=" "gather" "(" identifier ("," expression)? ")"
connector_cmd: connector_instantiation
connector_instantiation: identifier "=" "avapConnector" "(" stringliteral ")"
http_command: req_post_cmd
| req_get_cmd
req_post_cmd: "RequestPost" "(" expression "," expression "," expression "," expression "," identifier "," expression ")"
req_get_cmd: "RequestGet" "(" expression "," expression "," expression "," identifier "," expression ")"
db_command: orm_direct
| orm_check
| orm_create
| orm_select
| orm_insert
| orm_update
orm_direct: "ormDirect" "(" expression "," identifier ")"
orm_check: "ormCheckTable" "(" expression "," identifier ")"
orm_create: "ormCreateTable" "(" expression "," expression "," expression "," identifier ")"
orm_select: "ormAccessSelect" "(" orm_fields "," expression ("," expression)? "," identifier ")"
orm_fields: "*"
| expression
orm_insert: "ormAccessInsert" "(" expression "," expression "," identifier ")"
orm_update: "ormAccessUpdate" "(" expression "," expression "," expression "," expression "," identifier ")"
util_command: json_list_cmd
| crypto_cmd
| regex_cmd
| datetime_cmd
| stamp_cmd
| string_cmd
| replace_cmd
json_list_cmd: "variableToList" "(" expression "," identifier ")"
| "itemFromList" "(" identifier "," expression "," identifier ")"
| "variableFromJSON" "(" identifier "," expression "," identifier ")"
| "AddVariableToJSON" "(" expression "," expression "," identifier ")"
crypto_cmd: "encodeSHA256" "(" identifier_or_string "," identifier ")"
| "encodeMD5" "(" identifier_or_string "," identifier ")"
regex_cmd: "getRegex" "(" identifier "," stringliteral "," identifier ")"
datetime_cmd: "getDateTime" "(" stringliteral "," expression "," stringliteral "," identifier ")"
stamp_cmd: "stampToDatetime" "(" expression "," stringliteral "," expression "," identifier ")"
| "getTimeStamp" "(" stringliteral "," stringliteral "," expression "," identifier ")"
string_cmd: "randomString" "(" expression "," expression "," identifier ")"
replace_cmd: "replace" "(" identifier_or_string "," stringliteral "," stringliteral "," identifier ")"
function_decl: "function" identifier "(" param_list? ")" "{" separator block "}"
param_list: identifier ("," identifier)*
return_stmt: "return" "(" expression? ")"
modularity_cmd: include_stmt
| import_stmt
include_stmt: "include" stringliteral
import_stmt: "import" ("<" identifier ">" | stringliteral)
?expression: logical_or
?logical_or: logical_and ("or" logical_and)*
?logical_and: logical_not ("and" logical_not)*
?logical_not: "not" logical_not
| comparison
?comparison: arithmetic (comp_op arithmetic)*
comp_op: "==" | "!=" | "<" | ">" | "<=" | ">=" | "in" | "is"
?arithmetic: term (("+" | "-") term)*
?term: factor (("*" | "/" | "%") factor)*
?factor: ("+" | "-") factor
| power
?power: primary ("**" factor)?
?primary: atom postfix*
postfix: "." identifier
| "[" expression "]"
| "[" expression? ":" expression? (":" expression?)? "]"
| "(" argument_list? ")"
?atom: identifier
| "$" identifier
| literal
| "(" expression ")"
| list_display
| dict_display
list_display: "[" argument_list? "]"
| "[" expression "for" identifier "in" expression if_clause? "]"
if_clause: "if" expression
dict_display: "{" key_datum_list? "}"
key_datum_list: key_datum ("," key_datum)*
key_datum: expression ":" expression
argument_list: expression ("," expression)*
number: FLOATNUMBER
| INTEGER
literal: stringliteral
| number
| boolean
| "None"
boolean: "True" | "False"
INTEGER: /[0-9]+/
FLOATNUMBER: /(?:[0-9]+\.[0-9]*|\.[0-9]+)/
stringliteral: STRING_DOUBLE
| STRING_SINGLE
# STRING_DOUBLE: /"([^"\\]|\\["'\\ntr0])*"/
# STRING_SINGLE: /'([^'\\]|\\["'\\ntr0])*'/
STRING_DOUBLE: /"([^"\\]|\\.)*"/
STRING_SINGLE: /'([^'\\]|\\.)*'/
identifier_or_string: identifier
| stringliteral
IDENTIFIER: /[A-Za-z_][A-Za-z0-9_]*/
%ignore /[ \t]+/

View File

@ -0,0 +1,3 @@
nivel = 5
es_admin = nivel >= 10
addResult(es_admin)

View File

@ -0,0 +1,4 @@
subtotal = 150.50
iva = subtotal * 0.21
total = subtotal + iva
addResult(total)

6
docs/LRM/bucle_1_10.avap Normal file
View File

@ -0,0 +1,6 @@
startLoop(i,1,10)
item = "item_%s" % i
AddvariableToJSON(item,'valor_generado',mi_json)
endLoop()
addResult(mi_json)

View File

@ -0,0 +1,7 @@
registros = ['1','2','3']
getListLen(registros, total)
contador = 0
startLoop(idx, 0, 2)
actual = registros[int(idx)]
endLoop()
addResult(actual)

View File

@ -0,0 +1,2 @@
getDateTime("", 86400, "UTC", expira)
addResult(expira)

View File

@ -0,0 +1,2 @@
addParam("client_id", id_interno)
addResult(id_interno)

View File

@ -0,0 +1,3 @@
addParam("emails", emails)
getQueryParamList("lista_correos", lista_correos)
addResult(lista_correos)

View File

@ -0,0 +1,5 @@
addParam("lang", l)
if(l, "es", "=")
addVar(msg, "Hola")
end()
addResult(msg)

View File

@ -0,0 +1,3 @@
nombre = "Sistema"
log = "Evento registrado por: %s" % nombre
addResult(log)

View File

@ -0,0 +1,4 @@
datos_cliente = "datos"
addVar(clave, "cliente_vip")
AddvariableToJSON(clave, datos_cliente, mi_json_final)
addResult(mi_json_final)

View File

@ -0,0 +1,3 @@
addParam("data_list", mi_lista)
getListLen(mi_lista, cantidad)
addResult(cantidad)

View File

@ -0,0 +1,2 @@
stampToDatetime(1708726162, "%d/%m/%Y", 0, fecha_human)
addResult(fecha_human)

View File

@ -0,0 +1,7 @@
addParam("sal_par",saldo)
if(saldo, 0, ">")
permitir = True
else()
permitir = False
end()
addResult(permitir)

View File

@ -0,0 +1,6 @@
addParam("userrype", user_type)
addParam("sells", compras)
if(None, None, " user_type == 'VIP' or compras > 100")
addVar(descuento, 0.20)
end()
addResult(descuento)

View File

@ -0,0 +1,2 @@
getDateTime("%Y-%m-%d %H:%M:%S", 0, "Europe/Madrid", sql_date)
addResult(sql_date)

View File

@ -0,0 +1,6 @@
function suma(a, b){
total = a + b
return(total)
}
resultado = suma(10, 20)
addResult(resultado)

View File

@ -0,0 +1,9 @@
function es_valido(token){
response = False
if(token, "SECRET", "=")
response = True
end()
return(response)
}
autorizado = es_valido("SECRET")
addResult(autorizado)

View File

@ -0,0 +1,2 @@
randomString("[A-Z]\d", 32, token_seguridad)
addResult(token_seguridad)

View File

@ -0,0 +1,2 @@
encodeSHA256("payload_data", checksum)
addResult(checksum)

View File

@ -0,0 +1,4 @@
registerEndpoint("/hello_world","GET",[],"HELLO_WORLD",main,result)
addVar(name,"Alberto")
result = "Hello," + name
addResult(result)

2
docs/LRM/hola_mundo.avap Normal file
View File

@ -0,0 +1,2 @@
addVar(mensaje, "Hola mundo desde AVAP")
addResult(mensaje)

View File

@ -0,0 +1,6 @@
addParam("password",pass_nueva)
pass_antigua = "password"
if(pass_nueva, pass_antigua, "!=")
addVar(cambio, "Contraseña actualizada")
end()
addResult(cambio)

View File

@ -0,0 +1,2 @@
replace("REF_1234_OLD","OLD", "NEW", ref_actualizada)
addResult(ref_actualizada)

View File

@ -0,0 +1,7 @@
try()
ormDirect("UPDATE table_inexistente SET a=1", res)
exception(e)
addVar(_status, 500)
addVar(error_msg, "Error de base de datos")
addResult(error_msg)
end()

View File

@ -0,0 +1,2 @@
getDateTime("", 0, "UTC", ahora)
addResult(ahora)

View File

@ -0,0 +1,6 @@
ormCheckTable(tabla_pruebas,resultado_comprobacion)
if(resultado_comprobacion,False,'==')
ormCreateTable("username,age",'VARCHAR,INTEGER',tabla_pruebas,resultado_creacion)
end()
addResult(resultado_comprobacion)
addResult(resultado_creacion)

View File

@ -0,0 +1,14 @@
addParam("page", p)
addParam("size", s)
registros = ["u1", "u2", "u3", "u4", "u5", "u6"]
offset = int(p) * int(s)
limite = offset + int(s)
contador = 0
addResult(offset)
addResult(limite)
startLoop(i, 2, limite)
actual = registros[int(i)]
titulo = "reg_%s" % i
AddvariableToJSON(titulo, actual, pagina_json)
endLoop()
addResult(pagina_json)

View File

@ -0,0 +1,3 @@
addVar(base, 1000)
addVar(copia, $base)
addResult(copia)

9
docs/LRM/replace.avap Normal file
View File

@ -0,0 +1,9 @@
addParam("password_base", password_base)
replace(password_base, "a", "@", temp1)
replace(temp1, "e", "3", temp2)
replace(temp2, "o", "0", temp3)
replace(temp3, "i", "!", modified_password)
randomString("[a-zA-Z0-9]", 4, suffix)
addVar(final_password, modified_password)
final_password = final_password + suffix
addResult(final_password)

View File

@ -0,0 +1,4 @@
addVar(code, 200)
addVar(status, "Success")
addResult(code)
addResult(status)

View File

@ -0,0 +1,8 @@
encontrado = False
startLoop(i, 1, 10)
if(i, 5, "==")
encontrado = True
i = 11
end()
endLoop()
addResult(encontrado)

5
docs/LRM/token.avap Normal file
View File

@ -0,0 +1,5 @@
addParam("password", password)
encodeSHA256(password, hashed_password)
randomString("[a-zA-Z0-9]", 32, secure_token)
addResult(hashed_password)
addResult(secure_token)

View File

@ -0,0 +1,6 @@
try()
RequestGet("https://api.test.com/data", 0, 0, respuesta, None)
exception(e)
addVar(error_trace, e)
addResult(error_trace)
end()

View File

@ -0,0 +1,6 @@
addParam("api_key", key)
if(key, None, "==")
addVar(_status, 403)
addVar(error, "Acceso denegado: falta API KEY")
addResult(error)
end()

View File

@ -0,0 +1,2 @@
stub(addResult(error), 5) => {}
assert(addResult(error), 5): {}

View File

@ -0,0 +1,8 @@
addParam("rol", r)
acceso = False
if(None, None, "r == 'admin' or r == 'editor' or r == 'root'")
acceso = True
end()
addResult(acceso)

View File

@ -0,0 +1,89 @@
# PRD-0001: OpenAI-Compatible HTTP Proxy
**Date:** 2026-03-18
**Status:** Implemented
**Requested by:** Rafael Ruiz (CTO)
**Implemented in:** PR #58
**Related ADR:** ADR-0001 (gRPC as primary interface)
---
## Problem
The Brunix Assistance Engine exposes a gRPC interface as its primary API. gRPC is the right choice for performance and type safety in server-to-server communication, but it creates a significant adoption barrier for two categories of consumers:
**Existing OpenAI integrations.** Any tool or client already configured to call the OpenAI API — VS Code extensions using `continue.dev`, LiteLLM routers, Open WebUI instances, internal tooling at 101OBEX, Corp — requires code changes to switch to gRPC. The switching cost is non-trivial and creates friction that slows adoption.
**Model replacement use case.** The core strategic value of the Brunix RAG is that it can replace direct OpenAI API consumption with a locally-hosted, domain-specific assistant that has no per-token cost and no data privacy concerns. This value proposition is only actionable if the replacement is transparent — i.e., the client does not need to change to consume the Brunix RAG instead of OpenAI.
Without a compatibility layer, the Brunix engine cannot serve as a drop-in replacement for OpenAI models. Every potential adopter faces an integration project instead of a configuration change.
---
## Solution
Implement an HTTP server running alongside the gRPC server that exposes:
- The OpenAI Chat Completions API (`/v1/chat/completions`) — both streaming and non-streaming
- The OpenAI Completions API (`/v1/completions`) — legacy support
- The OpenAI Models API (`/v1/models`) — for compatibility with clients that enumerate available models
- The Ollama Chat API (`/api/chat`) — NDJSON streaming format
- The Ollama Generate API (`/api/generate`) — for Ollama-native clients
- The Ollama Tags API (`/api/tags`) — for clients that list available models
- A health endpoint (`/health`)
The proxy bridges HTTP → gRPC internally: `stream: false` routes to `AskAgent`, `stream: true` routes to `AskAgentStream`. The gRPC interface remains the primary interface and is not modified.
Any client that currently points to `https://api.openai.com` can be reconfigured to point to `http://localhost:8000` (or the server's address) with `model: brunix` and will work without any other change.
---
## Scope
**In scope:**
- OpenAI-compatible endpoints as listed above
- Ollama-compatible endpoints as listed above
- Routing `stream: false` to `AskAgent` and `stream: true` to `AskAgentStream`
- Session ID propagation via the `session_id` extension field in `ChatCompletionRequest`
- Health endpoint
**Out of scope:**
- OpenAI function calling / tool use
- OpenAI embeddings API (`/v1/embeddings`)
- OpenAI fine-tuning or moderation APIs
- Authentication / API key validation (handled at infrastructure level)
- Multi-turn conversation reconstruction from the message array (the proxy extracts only the last user message as the query)
---
## Technical implementation
**Stack:** FastAPI + uvicorn, running on port 8000 inside the same container as the gRPC server.
**Concurrency:** An asyncio event loop bridges FastAPI's async context with the synchronous gRPC calls via a dedicated `ThreadPoolExecutor` (configurable via `PROXY_THREAD_WORKERS`, default 20). This prevents gRPC blocking calls from stalling the async HTTP server.
**Streaming:** An `asyncio.Queue` connects the gRPC token stream (produced in a thread) with the FastAPI `StreamingResponse` (consumed in the async event loop). Tokens are forwarded as SSE events (OpenAI format) or NDJSON (Ollama format) as they arrive from `AskAgentStream`.
**Entry point:** `entrypoint.sh` starts both the gRPC server and the HTTP proxy as parallel processes. If either crashes, the other is terminated — the container fails cleanly rather than entering a partially active state.
**Environment variables:**
| Variable | Default | Description |
|---|---|---|
| `BRUNIX_GRPC_TARGET` | `localhost:50051` | gRPC server address |
| `PROXY_MODEL_ID` | `brunix` | Model name returned by `/v1/models` and `/api/tags` |
| `PROXY_THREAD_WORKERS` | `20` | ThreadPoolExecutor size for gRPC calls |
---
## Validation
**Functional:** Any OpenAI-compatible client (continue.dev, LiteLLM, Open WebUI) can be pointed at `http://localhost:8000` with `model: brunix` and successfully send queries to the Brunix RAG without code changes.
**Strategic:** The VS Code extension and any 101OBEX, Corp internal tooling currently consuming OpenAI can switch to the Brunix RAG by changing one endpoint URL and one model name. No other changes required.
---
## Impact on existing interfaces
The gRPC interface (`AskAgent`, `AskAgentStream`, `EvaluateRAG`) is unchanged. Existing gRPC clients are not affected. The proxy is additive — it does not replace the gRPC interface, it complements it.

View File

@ -0,0 +1,199 @@
# PRD-0002: Editor Context Injection for VS Code Extension
**Date:** 2026-03-19
**Status:** Implemented
**Requested by:** Rafael Ruiz (CTO)
**Purpose:** Validate the VS Code extension with real users
**Related ADR:** ADR-0001 (gRPC interface), ADR-0002 (two-phase streaming)
---
## Problem
The Brunix Assistance Engine previously received only two inputs from the client: a `query` (the user's question) and a `session_id` (for conversation continuity). It had no awareness of what the user was looking at in their editor when they asked the question.
This created a fundamental limitation for a coding assistant: the user asking "how do I handle the error here?" or "what does this function return?" could not be answered correctly without knowing what "here" and "this function" referred to. The assistant was forced to treat every question as a general AVAP documentation query, even when the user's intent was clearly anchored to specific code in their editor.
For the VS Code extension validation, the CEO needed to demonstrate that the assistant behaves as a genuine coding assistant — one that understands the user's current context — not just a documentation search tool.
---
## Solution
The gRPC contract has been extended to allow the VS Code extension to send four optional context fields alongside every query. These fields are transported in the standard OpenAI `user` field as a JSON string when using the HTTP proxy, and as dedicated proto fields when calling gRPC directly.
**Transport format via HTTP proxy (`/v1/chat/completions`):**
```json
{
"model": "brunix",
"messages": [{"role": "user", "content": "que hace este código?"}],
"stream": true,
"session_id": "uuid",
"user": "{\"editor_content\":\"<base64>\",\"selected_text\":\"<base64>\",\"extra_context\":\"<base64>\",\"user_info\":{\"dev_id\":1,\"project_id\":2,\"org_id\":3}}"
}
```
**Fields:**
- **`editor_content`** (base64) — full content of the active file open in the editor. Gives the assistant awareness of the complete code the user is working on.
- **`selected_text`** (base64) — text currently selected in the editor, if any. The most precise signal of user intent — if the user has selected a block of code before asking a question, that block is almost certainly what the question is about.
- **`extra_context`** (base64) — free-form additional context (e.g., file path, language identifier, cursor position, open diagnostic errors). Extensible without requiring proto changes.
- **`user_info`** (JSON object) — client identity metadata: `dev_id`, `project_id`, `org_id`. Not base64 — sent as a JSON object nested within the `user` JSON string.
All four fields are optional. If none are provided, the assistant behaves exactly as it does today — full backward compatibility.
---
## User experience
**Scenario 1 — Question about selected code:**
The user selects a `try() / exception() / end()` block in their editor and asks "why is this not catching my error?". The assistant detects via the classifier that the question refers explicitly to the selected code, injects `selected_text` into the generation prompt, and answers specifically about that block — not about error handling in general.
**Scenario 2 — Question about the open file:**
The user has a full AVAP function open and asks "what HTTP status codes can this return?". The classifier detects the question refers to editor content, injects `editor_content` into the generation prompt, and reasons about the `_status` assignments in the function.
**Scenario 3 — General question (unchanged behaviour):**
The user asks "how does addVar work?" without selecting anything or referring to the editor. The classifier sets `use_editor_context: False`. The assistant behaves exactly as before — retrieval-augmented response from the AVAP knowledge base, no editor content injected.
---
## Scope
**In scope:**
- Add `editor_content`, `selected_text`, `extra_context`, `user_info` fields to `AgentRequest` in `brunix.proto`
- Decode base64 fields (`editor_content`, `selected_text`, `extra_context`) in `server.py` before propagating to graph state
- Parse `user_info` as opaque JSON string — available in state for future use, not yet consumed by the graph
- Parse the `user` field in `openai_proxy.py` as a JSON object containing all four context fields
- Propagate all fields through the server into the graph state (`AgentState`)
- Extend the classifier (`CLASSIFY_PROMPT_TEMPLATE`) to output two tokens: query type and editor context signal (`EDITOR` / `NO_EDITOR`)
- Set `use_editor_context: bool` in `AgentState` based on classifier output
- Use `selected_text` as the primary anchor for query reformulation only when `use_editor_context` is `True`
- Inject `selected_text` and `editor_content` into the generation prompt only when `use_editor_context` is `True`
- Fix reformulator language — queries must be rewritten in the original language, never translated
**Out of scope:**
- Changes to `EvaluateRAG` — the golden dataset does not include editor-context queries; this feature does not affect embedding or retrieval evaluation
- Consuming `user_info` fields (`dev_id`, `project_id`, `org_id`) in the graph — available in state for future routing or personalisation
- Evaluation of the feature impact via EvaluateRAG — a dedicated golden dataset with editor-context queries is required for that measurement; it is future work
---
## Technical design
### Proto changes (`brunix.proto`)
```protobuf
message AgentRequest {
string query = 1; // unchanged
string session_id = 2; // unchanged
string editor_content = 3; // base64-encoded full editor file content
string selected_text = 4; // base64-encoded currently selected text
string extra_context = 5; // base64-encoded free-form additional context
string user_info = 6; // JSON string: {"dev_id":…,"project_id":…,"org_id":…}
}
```
Fields 1 and 2 are unchanged. Fields 36 are optional — absent fields default to empty string in proto3. All existing clients remain compatible without modification.
### AgentState changes (`state.py`)
```python
class AgentState(TypedDict):
# Core fields
messages: Annotated[list, add_messages]
session_id: str
query_type: str
reformulated_query: str
context: str
# Editor context fields (PRD-0002)
editor_content: str # decoded from base64
selected_text: str # decoded from base64
extra_context: str # decoded from base64
user_info: str # JSON string — {"dev_id":…,"project_id":…,"org_id":…}
# Set by classifier — True only when user explicitly refers to editor code
use_editor_context: bool
```
### Server changes (`server.py`)
Base64 decoding applied to `editor_content`, `selected_text` and `extra_context` before propagation. `user_info` passed as-is (plain JSON string). Helper function:
```python
def _decode_b64(value: str) -> str:
try:
return base64.b64decode(value).decode("utf-8") if value else ""
except Exception:
logger.warning(f"[base64] decode failed")
return ""
```
### Proxy changes (`openai_proxy.py`)
The `user` field is parsed as a JSON object. `_parse_editor_context` extracts all four fields:
```python
def _parse_editor_context(user: Optional[str]) -> tuple[str, str, str, str]:
if not user:
return "", "", "", ""
try:
ctx = json.loads(user)
if isinstance(ctx, dict):
return (
ctx.get("editor_content", "") or "",
ctx.get("selected_text", "") or "",
ctx.get("extra_context", "") or "",
json.dumps(ctx.get("user_info", {})),
)
except (json.JSONDecodeError, TypeError):
pass
return "", "", "", ""
```
`session_id` is now read exclusively from the dedicated `session_id` field — no longer falls back to `user`.
### Classifier changes (`prompts.py` + `graph.py`)
`CLASSIFY_PROMPT_TEMPLATE` now outputs two tokens separated by a space:
- First token: `RETRIEVAL`, `CODE_GENERATION`, or `CONVERSATIONAL`
- Second token: `EDITOR` or `NO_EDITOR`
`EDITOR` is set only when the user message explicitly refers to the editor code or selected text using expressions like "this code", "este codigo", "fix this", "que hace esto", "explain this", etc.
`_parse_query_type` returns `tuple[str, bool]`. Both `classify` nodes (in `build_graph` and `build_prepare_graph`) set `use_editor_context` in the state.
### Reformulator changes (`prompts.py` + `graph.py`)
Two fixes applied:
**Mode-aware reformulation:** The reformulator receives `[MODE: X]` prepended to the query. In `RETRIEVAL` mode it compresses the query without expanding AVAP commands. In `CODE_GENERATION` mode it applies the command mapping. In `CONVERSATIONAL` mode it returns the query as-is.
**Language preservation:** The reformulator never translates. Queries in Spanish are rewritten in Spanish. Queries in English are rewritten in English. This fix was required because the BM25 retrieval is lexical — a Spanish chunk ("AVAP es un DSL...") cannot be found by an English query ("AVAP stand for").
### Generator changes (`graph.py`)
`_build_generation_prompt` injects `editor_content` and `selected_text` into the prompt only when `use_editor_context` is `True`. Priority hierarchy when injected:
1. `selected_text` — highest priority, most specific signal
2. `editor_content` — file-level context
3. RAG-retrieved chunks — knowledge base context
4. `extra_context` — free-form additional context
---
## Validation
**Acceptance criteria:**
- A query explicitly referring to selected code (`selected_text` non-empty, classifier returns `EDITOR`) produces a response grounded in that specific code.
- A general query (`use_editor_context: False`) produces a response identical in quality to the pre-PRD-0002 system — no editor content injected, no regression.
- A query in Spanish retrieves Spanish chunks correctly — the reformulator preserves the language.
- Existing gRPC clients that do not send the new fields work without modification.
- The `user` field in the HTTP proxy can be a plain string or absent — no error raised.
**Future measurement:**
Once the extension is validated and the embedding model is selected (ADR-0005), a dedicated golden dataset of editor-context queries should be built and added to `EvaluateRAG` to measure the quantitative impact of this feature.
---
## Impact on parallel workstreams
**Embedding evaluation (ADR-0005 / MrHouston):** No impact. The BEIR benchmarks and EvaluateRAG runs for embedding model selection use the existing golden dataset, which contains no editor-context queries. The two workstreams are independent.
**RAG architecture evolution:** This feature is additive. It does not change the retrieval infrastructure, the Elasticsearch index, or the embedding pipeline. It extends the graph with additional input signals that improve response quality for editor-anchored queries.

View File

@ -1,3 +1,3 @@
addParam(emails,emails)
getQueryParamList(lista_correos)
addParam("emails", emails)
getQueryParamList("lista_correos", lista_correos)
addResult(lista_correos)

View File

@ -1,4 +1,4 @@
addParam(sal_par,saldo)
addParam("sal_par",saldo)
if(saldo, 0, ">")
permitir = True
else()

View File

@ -1,5 +1,5 @@
addParam(userrype, user_type)
addParam(sells, compras)
addParam("userrype", user_type)
addParam("sells", compras)
if(None, None, " user_type == 'VIP' or compras > 100")
addVar(descuento, 0.20)
end()

View File

@ -1,3 +1,4 @@
addParam("Alberto",name)
result = "Hello," + name
registerEndpoint("/hello_world","GET",[],"HELLO_WORLD",main,result)
addVar(name,"Alberto")
result = "Hello," + name
addResult(result)

View File

@ -1,4 +1,4 @@
addParam(password,pass_nueva)
addParam("password",pass_nueva)
pass_antigua = "password"
if(pass_nueva, pass_antigua, "!=")
addVar(cambio, "Contraseña actualizada")

View File

@ -1,5 +1,7 @@
try()
ormDirect("UPDATE table_inexistente SET a=1", res)
exception(e)
addVar(_status,500)
addResult("Error de base de datos")
addVar(_status, 500)
addVar(error_msg, "Error de base de datos")
addResult(error_msg)
end()

View File

@ -1,5 +1,6 @@
try()
RequestGet("https://api.test.com/data", 0, 0, respuesta)
RequestGet("https://api.test.com/data", 0, 0, respuesta, None)
exception(e)
addVar(error_trace, "Fallo de conexión: %s" % e)
addVar(error_trace, e)
addResult(error_trace)
end()

View File

@ -1,5 +1,8 @@
addParam("rol", r)
if(r, ["admin", "editor", "root"], "in")
acceso = False
if(None, None, "r == 'admin' or r == 'editor' or r == 'root'")
acceso = True
end()
addResult(acceso)

95603
ingestion/chunks-bge.json Normal file

File diff suppressed because one or more lines are too long

95603
ingestion/chunks-harrier.json Normal file

File diff suppressed because one or more lines are too long

95603
ingestion/chunks-qwen.json Normal file

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,40 @@
<program> ::= ( <line> | <block_comment> )*
<line> ::= [ <statement> ] [ <line_comment> | <doc_comment> ] <EOL>
| ( <line_comment> | <doc_comment> ) <EOL>
<EOL> ::= /(\r\n|\n)/
<doc_comment> ::= "///" <any_text>
<line_comment> ::= "//" <any_text>
<block_comment> ::= "/*" <any_content> "*/"
<any_text> ::= [^\r\n]*
<any_content> ::= /* Cualquier secuencia de caracteres que no contenga la subcadena "*/" */
<statement> ::= <assignment>
| <method_call_stmt>
| <function_call_stmt>
| <function_decl>
| <return_stmt>
| <system_command>
| <io_command>
| <control_flow>
| <async_command>
| <connector_cmd>
| <db_command>
| <http_command>
| <util_command>
| <modularity_cmd>
<assignment> ::= <identifier> "=" <expression>
<function_call_stmt> ::= <identifier> "(" [<argument_list>] ")"
<method_call_stmt> ::= <identifier> "=" <identifier> "." <identifier> "(" [<argument_list>] ")"
<system_command> ::= <register_cmd> | <addvar_cmd>
<register_cmd> ::= "registerEndpoint(" <stringliteral> "," <stringliteral> "," <list_display> "," <stringliteral> "," <identifier> "," <identifier> ")"
<addvar_cmd> ::= "addVar(" <addvar_arg> "," <addvar_arg> ")"
<addvar_arg> ::= <identifier> | <literal> | "$" <identifier>
<identifier> ::= /[A-Za-z_][A-Za-z0-9_]*/
<system_variable> ::= "_status"

View File

@ -0,0 +1,5 @@
<io_command> ::= <addparam_cmd> | <getlistlen_cmd> | <addresult_cmd> | <getparamlist_cmd>
<addparam_cmd> ::= "addParam(" <stringliteral> "," <identifier> ")"
<getlistlen_cmd> ::= "getListLen(" <identifier> "," <identifier> ")"
<getparamlist_cmd> ::= "getQueryParamList(" <stringliteral> "," <identifier> ")"
<addresult_cmd> ::= "addResult(" <identifier> ")"

View File

@ -0,0 +1,28 @@
<control_flow> ::= <if_stmt> | <loop_stmt> | <try_stmt>
<if_stmt> ::= "if(" <if_condition> ")" <EOL>
<block>
[ "else()" <EOL> <block> ]
"end()" <EOL>
/* if() soporta dos modos:
Modo 1 — comparación estructurada: los dos primeros argumentos deben ser
identificadores simples o literales, nunca expresiones de acceso.
Si se necesita comparar un valor extraído de una estructura (ej. dict['clave']),
debe asignarse previamente a una variable.
Modo 2 — expresión libre: None, None, expresión compleja como string */
<if_condition> ::= <if_atom> "," <if_atom> "," <stringliteral>
| "None" "," "None" "," <stringliteral>
<if_atom> ::= <identifier> | <literal>
<loop_stmt> ::= "startLoop(" <identifier> "," <expression> "," <expression> ")" <EOL>
<block>
"endLoop()" <EOL>
<try_stmt> ::= "try()" <EOL>
<block>
"exception(" <identifier> ")" <EOL>
<block>
"end()" <EOL>
<block> ::= <line>*

View File

@ -0,0 +1,3 @@
<async_command> ::= <go_stmt> | <gather_stmt>
<go_stmt> ::= <identifier> "=" "go" <identifier> "(" [<argument_list>] ")"
<gather_stmt> ::= <identifier> "=" "gather(" <identifier> ["," <expression>] ")"

View File

@ -0,0 +1,25 @@
/* Instanciación de conector de terceros y llamada a sus métodos dinámicos */
<connector_cmd> ::= <connector_instantiation> | <connector_method_call>
<connector_instantiation> ::= <identifier> "=" "avapConnector(" <stringliteral> ")"
<connector_method_call> ::= [ <identifier> "=" ] <identifier> "." <identifier> "(" [<argument_list>] ")"
/* Cliente HTTP con Timeout Obligatorio */
<http_command> ::= <req_post_cmd> | <req_get_cmd>
<req_post_cmd> ::= "RequestPost(" <expression> "," <expression> "," <expression> "," <expression> "," <identifier> "," <expression> ")"
<req_get_cmd> ::= "RequestGet(" <expression> "," <expression> "," <expression> "," <identifier> "," <expression> ")"
/* ORM y Persistencia (Estandarizado con tableName) */
<db_command> ::= <orm_direct> | <orm_check> | <orm_create> | <orm_select> | <orm_insert> | <orm_update>
<orm_direct> ::= "ormDirect(" <expression> "," <identifier> ")"
<orm_check> ::= "ormCheckTable(" <expression> "," <identifier> ")"
<orm_create> ::= "ormCreateTable(" <expression> "," <expression> "," <expression> "," <identifier> ")"
/* ormAccessSelect(fields, tableName, selector, varTarget) */
<orm_select> ::= "ormAccessSelect(" <orm_fields> "," <expression> "," [<expression>] "," <identifier> ")"
<orm_fields> ::= "*" | <expression>
/* ormAccessInsert(fieldsValues, tableName, varTarget) */
<orm_insert> ::= "ormAccessInsert(" <expression> "," <expression> "," <identifier> ")"
/* ormAccessUpdate(fields, fieldsValues, tableName, selector, varTarget) */
<orm_update> ::= "ormAccessUpdate(" <expression> "," <expression> "," <expression> "," <expression> "," <identifier> ")"

View File

@ -0,0 +1,29 @@
/* [CORRECCIÓN] Todas las subreglas de <util_command> están ahora completamente expandidas. */
<util_command> ::= <json_list_cmd> | <crypto_cmd> | <regex_cmd> | <datetime_cmd> | <stamp_cmd> | <string_cmd> | <replace_cmd>
/* Manipulación de listas y JSON */
<json_list_cmd> ::= "variableToList(" <expression> "," <identifier> ")"
| "itemFromList(" <identifier> "," <expression> "," <identifier> ")"
| "variableFromJSON(" <identifier> "," <expression> "," <identifier> ")"
| "AddVariableToJSON(" <expression> "," <expression> "," <identifier> ")"
/* Criptografía */
<crypto_cmd> ::= "encodeSHA256(" <identifier_or_string> "," <identifier> ")"
| "encodeMD5(" <identifier_or_string> "," <identifier> ")"
/* Expresiones regulares */
<regex_cmd> ::= "getRegex(" <identifier> "," <stringliteral> "," <identifier> ")"
<datetime_cmd> ::= "getDateTime(" <stringliteral> "," <expression> "," <stringliteral> "," <identifier> ")"
/* Argumentos: formato_salida, epoch_origen, zona_horaria, destino */
<stamp_cmd> ::= "stampToDatetime(" <expression> "," <stringliteral> "," <expression> "," <identifier> ")"
/* Argumentos: epoch_origen, formato, timedelta, destino */
| "getTimeStamp(" <stringliteral> "," <stringliteral> "," <expression> "," <identifier> ")"
/* Argumentos: fecha_string, formato_entrada, timedelta, destino */
<string_cmd> ::= "randomString(" <expression> "," <identifier> ")"
/* Argumentos: longitud, destino */
<replace_cmd> ::= "replace(" <identifier_or_string> "," <stringliteral> "," <stringliteral> "," <identifier> ")"
/* Argumentos: origen, patron_busqueda, reemplazo, destino */

View File

@ -0,0 +1,9 @@
/* Nota: las funciones utilizan llaves {} como delimitadores de bloque por decisión
arquitectónica explícita, diferenciándose de las estructuras de control (if, loop, try)
que usan palabras clave de cierre (end(), endLoop()). Ambos patrones coexisten
en la gramática y el parser los distingue por el token de apertura. */
<function_decl> ::= "function" <identifier> "(" [<param_list>] ")" "{" <EOL>
<block>
"}" <EOL>
<param_list> ::= <identifier> ("," <identifier>)*
<return_stmt> ::= "return(" [<expression>] ")"

View File

@ -0,0 +1,3 @@
<modularity_cmd> ::= <include_stmt> | <import_stmt>
<include_stmt> ::= "include" " " <stringliteral>
<import_stmt> ::= "import" " " ( "<" <identifier> ">" | <stringliteral> )

View File

@ -0,0 +1,62 @@
/* Jerarquía de Expresiones (Precedencia de menor a mayor) */
<expression> ::= <logical_or>
<logical_or> ::= <logical_and> ( "or" <logical_and> )*
<logical_and> ::= <logical_not> ( "and" <logical_not> )*
<logical_not> ::= "not" <logical_not> | <comparison>
<comparison> ::= <arithmetic> ( <comp_op> <arithmetic> )*
<comp_op> ::= "==" | "!=" | "<" | ">" | "<=" | ">=" | "in" | "is"
<arithmetic> ::= <term> ( ( "+" | "-" ) <term> )*
<term> ::= <factor> ( ( "*" | "/" | "%" ) <factor> )*
<factor> ::= ( "+" | "-" ) <factor> | <power>
<power> ::= <primary> [ "**" <factor> ]
/* Primarios y Átomos (Accesos, Castings, Slicing, Métodos y Funciones)
La regla <primary> cubre también el acceso a métodos de objetos conector
(conector.metodo(...)) y el acceso por clave a sus resultados (resultado["key"]) */
<primary> ::= <atom>
| <primary> "." <identifier>
| <primary> "[" <expression> "]"
| <primary> "[" [<expression>] ":" [<expression>] [":" [<expression>]] "]"
| <primary> "(" [<argument_list>] ")"
<atom> ::= <identifier>
| "$" <identifier>
| <literal>
| "(" <expression> ")"
| <list_display>
| <dict_display>
/* Estructuras de Datos, Comprensiones y Argumentos */
<list_display> ::= "[" [<argument_list>] "]"
| "[" <expression> "for" <identifier> "in" <expression> [<if_clause>] "]"
<if_clause> ::= "if" <expression>
<dict_display> ::= "{" [<key_datum_list>] "}"
<key_datum_list> ::= <key_datum> ( "," <key_datum> )*
<key_datum> ::= <expression> ":" <expression>
<argument_list> ::= <expression> ( "," <expression> )*
/* Tipo numérico unificado */
<number> ::= <floatnumber> | <integer>
/* Literales (Tipos de Datos Primitivos Soportados) */
<literal> ::= <stringliteral> | <number> | <boolean> | "None"
<boolean> ::= "True" | "False"
<integer> ::= [0-9]+
<floatnumber> ::= [0-9]+ "." [0-9]* | "." [0-9]+
/* Cadenas de Texto con soporte de secuencias de escape */
<stringliteral> ::= "\"" <text_double> "\"" | "'" <text_single> "'"
<escape_sequence> ::= "\\" ( "\"" | "'" | "\\" | "n" | "t" | "r" | "0" )
<text_double> ::= ( [^"\\] | <escape_sequence> )*
<text_single> ::= ( [^'\\] | <escape_sequence> )*
<identifier_or_string> ::= <identifier> | <stringliteral>
/* Reglas de Comentarios para el Lexer
El lexer aplica longest-match: /// debe evaluarse ANTES que // */
<doc_comment> ::= "///" <any_text>
<line_comment> ::= "//" <any_text>
<block_comment> ::= "/*" <any_content> "*/"
<any_text> ::= [^\r\n]*
<any_content> ::= /* Cualquier secuencia de caracteres que no contenga la subcadena "*/" */

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,145 @@
[
{
"task_id": 1,
"text": "Crear un endpoint que reciba un parámetro 'mensaje' y lo devuelva con todas las vocales reemplazadas por asteriscos",
"code": "addParam(\"mensaje\", texto)\nreplace(texto, \"[aeiouAEIOU]\", \"*\", resultado)\naddResult(resultado)",
"test_inputs": {
"mensaje": "Hola mundo"
},
"test_list": [
"re.match(r'H\\*l\\* m\\*nd\\*', resultado)"
],
"_detected": [
"addParam",
"addResult",
"replace"
],
"_reward": {
"ecs": 0.079,
"novelty": 1.0,
"test_quality": 1.0,
"reward": 0.539,
"detected": [
"addParam",
"addResult",
"replace"
]
}
},
{
"task_id": 2,
"text": "Crear un endpoint que reciba un parámetro 'password' y devuelva su hash MD5",
"code": "addParam(\"password\", entrada)\nencodeMD5(entrada, hash_resultado)\naddResult(hash_resultado)",
"test_inputs": {
"password": "test123"
},
"test_list": [
"re.match(r'^[a-f0-9]{32}$', hash_resultado)"
],
"_detected": [
"addParam",
"addResult",
"encodeMD5"
],
"_reward": {
"ecs": 0.079,
"novelty": 0.5,
"test_quality": 1.0,
"reward": 0.364,
"detected": [
"addParam",
"addResult",
"encodeMD5"
]
}
},
{
"task_id": 3,
"text": "Crear un endpoint que reciba un parámetro 'password' y devuelva su hash SHA-256",
"code": "addParam(\"password\", entrada)\nencodeSHA256(entrada, hash_resultado)\naddResult(hash_resultado)",
"test_inputs": {
"password": "miPassword123"
},
"test_list": [
"re.match(r'^[a-f0-9]{64}$', hash_resultado)"
],
"_detected": [
"addParam",
"addResult",
"encodeSHA256"
],
"_reward": {
"ecs": 0.079,
"novelty": 0.5,
"test_quality": 1.0,
"reward": 0.364,
"detected": [
"addParam",
"addResult",
"encodeSHA256"
]
}
},
{
"task_id": 4,
"text": "Crear un endpoint que reciba un parámetro 'nombre' y lo almacene en una variable usando addVar, luego devolver el nombre almacenado",
"code": "addParam(\"nombre\", entrada)\naddVar(resultado, entrada)\naddResult(resultado)",
"test_inputs": {
"nombre": "Juan"
},
"test_list": [
"re.match(r'^Juan$', resultado)",
"re.match(r'^\\w+$', resultado)"
],
"_detected": [
"addParam",
"addResult",
"addVar"
],
"_reward": {
"ecs": 0.079,
"novelty": 0.5,
"test_quality": 1.0,
"reward": 0.364,
"detected": [
"addParam",
"addResult",
"addVar"
]
}
},
{
"task_id": 5,
"text": "Crear un endpoint que reciba un parámetro 'edad' y devuelva un mensaje personalizado. Si la edad es mayor o igual a 18, devuelve 'Adulto', sino devuelve 'Menor'",
"code": "addParam(\"edad\", edad_usuario)\nif(edad_usuario, 18, \">=\")\naddVar(mensaje, \"Adulto\")\nelse()\naddVar(mensaje, \"Menor\")\nend()\naddResult(mensaje)",
"test_inputs": {
"edad": "20"
},
"test_list": [
"re.match(r'Adulto', mensaje)",
"re.match(r'^(Adulto|Menor)$', mensaje)"
],
"_detected": [
"addParam",
"addResult",
"addVar",
"else",
"end",
"if_mode1"
],
"_reward": {
"ecs": 0.158,
"novelty": 0.5,
"test_quality": 1.0,
"reward": 0.404,
"detected": [
"addParam",
"addResult",
"addVar",
"else",
"end",
"if_mode1"
]
}
}
]

View File

@ -0,0 +1,26 @@
{
"mode": "reward",
"weights": {
"w_ecs": 0.5,
"w_novelty": 0.35,
"w_tests": 0.15
},
"dataset_size": 5,
"pool_size": 5,
"pool_summary": "GoldPool: 5/5 | reward: min=0.364 max=0.539 mean=0.407",
"distribution_entropy": 2.769,
"node_type_frequency": {
"addParam": 5,
"addResult": 5,
"replace": 1,
"encodeMD5": 1,
"encodeSHA256": 1,
"addVar": 2,
"else": 1,
"end": 1,
"if_mode1": 1
},
"covered_constructs": 9,
"total_constructs": 38,
"mean_reward": 0.407
}

View File

@ -0,0 +1,287 @@
[
{
"task_id": 1,
"text": "Crear un endpoint que valide credenciales de usuario: recibe username y password, genera hash SHA256 de la contraseña, consulta la base de datos para verificar las credenciales y devuelve la cantidad de usuarios encontrados junto con el estado de autenticación.",
"code": "addParam(\"username\", user_input)\naddParam(\"password\", pass_input)\nencodeSHA256(pass_input, hashed_pass)\normAccessSelect(\"*\", \"users\", \"username='\" + user_input + \"' AND password='\" + hashed_pass + \"'\", user_results)\ngetListLen(user_results, total_users)\nif(total_users, 0, \">\")\n_status = 200\naddVar(auth_status, \"success\")\nelse()\n_status = 401\naddVar(auth_status, \"failed\")\nend()\naddVar(user_count, total_users)\naddResult(auth_status)\naddResult(user_count)",
"test_inputs": {
"username": "admin",
"password": "secret123"
},
"test_list": [
"re.match(r'success|failed', auth_status)",
"re.match(r'\\d+', str(user_count))"
],
"_cell": [
"encodeSHA256",
"getListLen",
"ormAccessSelect"
],
"_quality": {
"fidelity": 1.0,
"bonus_ratio": 0.2,
"test_quality": 1.0,
"richness": 0.5,
"quality": 1.31,
"detected": [
"_status",
"addParam",
"addResult",
"addVar",
"else",
"encodeSHA256",
"end",
"getListLen",
"if_mode1",
"ormAccessSelect"
],
"cell": [
"encodeSHA256",
"getListLen",
"ormAccessSelect"
],
"extra": [
"_status",
"addParam",
"addResult",
"addVar",
"else",
"end",
"if_mode1"
]
}
},
{
"task_id": 2,
"text": "Crear un sistema de autenticación que genere un hash SHA256 de la contraseña, ejecute una validación asíncrona del usuario y registre la fecha/hora del intento de login",
"code": "addParam(\"username\", username)\naddParam(\"password\", password)\nencodeSHA256(password, password_hash)\ntask_id = go validateUser(username, password_hash)\ngetDateTime(\"%Y-%m-%d %H:%M:%S\", 0, \"UTC\", login_time)\nresult = gather(task_id, 3000)\nif(result, None, \"!=\")\naddVar(_status, 200)\naddResult(username)\naddResult(login_time)\nelse()\naddVar(_status, 401)\nend()\n\nfunction validateUser(user, hash_pass)\n{\nif(user, \"admin\", \"==\")\nif(hash_pass, \"ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f\", \"==\")\nreturn(\"valid\")\nelse()\nreturn(None)\nend()\nelse()\nreturn(None)\nend()\n}",
"test_inputs": {
"username": "admin",
"password": "secret123"
},
"test_list": [
"re.match(r'\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}', login_time)",
"re.match(r'[a-f0-9]{64}', password_hash)"
],
"_cell": [
"encodeSHA256",
"gather",
"getDateTime"
],
"_quality": {
"fidelity": 1.0,
"bonus_ratio": 0.257,
"test_quality": 1.0,
"richness": 0.833,
"quality": 1.36,
"detected": [
"_status",
"addParam",
"addResult",
"addVar",
"else",
"encodeSHA256",
"end",
"function",
"gather",
"getDateTime",
"if_mode1",
"return"
],
"cell": [
"encodeSHA256",
"gather",
"getDateTime"
],
"extra": [
"_status",
"addParam",
"addResult",
"addVar",
"else",
"end",
"function",
"if_mode1",
"return"
]
}
},
{
"task_id": 3,
"text": "Crear un servicio que valide credenciales de usuario mediante hash SHA256, registre el timestamp de autenticación y extraiga datos del perfil JSON del usuario",
"code": "addParam(\"user_credentials\", userData)\naddParam(\"profile\", profileData)\nvariableFromJSON(userData, \"password\", rawPassword)\nvariableFromJSON(userData, \"username\", username)\nvariableFromJSON(profileData, \"email\", userEmail)\nencodeSHA256(rawPassword, hashedPassword)\ngetTimeStamp(\"2024-01-15 10:30:00\", \"%Y-%m-%d %H:%M:%S\", 0, loginTimestamp)\nif(hashedPassword, \"5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8\", \"==\")\n addVar(_status, 200)\n addResult(username)\n addResult(userEmail)\n addResult(loginTimestamp)\nelse()\n addVar(_status, 401)\nend()",
"test_inputs": {
"user_credentials": "{\"username\":\"john\", \"password\":\"password\"}",
"profile": "{\"email\":\"john@example.com\", \"role\":\"user\"}"
},
"test_list": [
"re.match(r'john', username)",
"re.match(r'\\d{10}', str(loginTimestamp))"
],
"_cell": [
"encodeSHA256",
"getTimeStamp",
"variableFromJSON"
],
"_quality": {
"fidelity": 1.0,
"bonus_ratio": 0.2,
"test_quality": 1.0,
"richness": 0.5,
"quality": 1.31,
"detected": [
"_status",
"addParam",
"addResult",
"addVar",
"else",
"encodeSHA256",
"end",
"getTimeStamp",
"if_mode1",
"variableFromJSON"
],
"cell": [
"encodeSHA256",
"getTimeStamp",
"variableFromJSON"
],
"extra": [
"_status",
"addParam",
"addResult",
"addVar",
"else",
"end",
"if_mode1"
]
}
},
{
"task_id": 4,
"text": "Crear un endpoint que procese datos de usuario desde JSON, valide la longitud de una lista de elementos y establezca el código de respuesta HTTP apropiado según los resultados",
"code": "addParam(\"user_data\", json_data)\nvariableFromJSON(json_data, \"items\", user_items)\ngetListLen(user_items, items_count)\nif(items_count, 0, \">\")\n addVar(_status, 200)\n addResult(items_count)\nelse()\n addVar(_status, 400)\nend()",
"test_inputs": {
"user_data": "{\"items\": [\"producto1\", \"producto2\", \"producto3\"]}"
},
"test_list": [
"re.match(r'200', str(_status))",
"re.match(r'3', str(items_count))"
],
"_cell": [
"_status",
"getListLen",
"variableFromJSON"
],
"_quality": {
"fidelity": 1.0,
"bonus_ratio": 0.171,
"test_quality": 1.0,
"richness": 0.3,
"quality": 1.281,
"detected": [
"_status",
"addParam",
"addResult",
"addVar",
"else",
"end",
"getListLen",
"if_mode1",
"variableFromJSON"
],
"cell": [
"_status",
"getListLen",
"variableFromJSON"
],
"extra": [
"addParam",
"addResult",
"addVar",
"else",
"end",
"if_mode1"
]
}
},
{
"task_id": 5,
"text": "Escribe un microservicio que actualice el estado de una cuenta bancaria. Si el parámetro 'action' es 'freeze', cambia el estado a 'frozen'. Si es cualquier otra acción, cambia a 'active'. Usa una tabla llamada 'accounts' con campos id y status.",
"code": "addParam(\"account_id\", account_id)\naddParam(\"action\", action)\nif(action, \"freeze\", \"==\")\naddVar(new_status, \"frozen\")\nelse()\naddVar(new_status, \"active\")\nend()\normAccessUpdate([\"status\"], [new_status], \"accounts\", \"id = \" + account_id, update_result)\naddResult(update_result)\naddVar(_status, 200)",
"test_inputs": {
"account_id": "123",
"action": "freeze"
},
"test_list": [
"re.match(r'frozen', new_status)",
"re.match(r'200', str(_status))"
],
"_cell": [
"else",
"end",
"ormAccessUpdate"
],
"_quality": {
"fidelity": 1.0,
"bonus_ratio": 0.143,
"test_quality": 1.0,
"richness": 0.333,
"quality": 1.276,
"detected": [
"_status",
"addParam",
"addResult",
"addVar",
"else",
"end",
"if_mode1",
"ormAccessUpdate"
],
"cell": [
"else",
"end",
"ormAccessUpdate"
],
"extra": [
"_status",
"addParam",
"addResult",
"addVar",
"if_mode1"
]
}
},
{
"task_id": 6,
"text": "Crea un endpoint que genere un token de sesión único. El sistema debe generar una cadena aleatoria de 16 caracteres alfanuméricos, calcular su hash MD5 para crear un identificador seguro, y devolver ambos valores en la respuesta JSON.",
"code": "randomString(\"[a-zA-Z0-9]\", 16, token)\nencodeMD5(token, token_hash)\naddResult(token)\naddResult(token_hash)",
"test_inputs": {},
"test_list": [
"re.match(r'^[a-zA-Z0-9]{16}$', token)",
"re.match(r'^[a-f0-9]{32}$', token_hash)"
],
"_cell": [
"addResult",
"encodeMD5",
"randomString"
],
"_quality": {
"fidelity": 1.0,
"bonus_ratio": 0.0,
"test_quality": 1.0,
"richness": 0.133,
"quality": 1.213,
"detected": [
"addResult",
"encodeMD5",
"randomString"
],
"cell": [
"addResult",
"encodeMD5",
"randomString"
],
"extra": []
}
}
]

View File

@ -0,0 +1,24 @@
{
"total_cells": 9139,
"filled_cells": 6,
"fill_rate": 0.0007,
"distribution_entropy": 3.684,
"node_type_frequency": {
"ormAccessSelect": 1,
"encodeSHA256": 3,
"getListLen": 2,
"gather": 1,
"getDateTime": 1,
"variableFromJSON": 2,
"getTimeStamp": 1,
"_status": 1,
"ormAccessUpdate": 1,
"else": 1,
"end": 1,
"randomString": 1,
"addResult": 1,
"encodeMD5": 1
},
"low_quality_cells": 0,
"empty_cells": 9133
}

View File

@ -0,0 +1,93 @@
[
{
"task_id": 1,
"text": "Crear un microservicio que valide la edad de un usuario y determine si es mayor de edad. El servicio debe recibir un parámetro 'edad' y devolver un mensaje de estado apropiado.",
"code": "addParam(\"edad\", user_age)\naddVar(min_age, 18)\nif(user_age, min_age, \">=\")\n addVar(status_msg, \"Usuario mayor de edad\")\n addVar(_status, 200)\nelse()\n addVar(status_msg, \"Usuario menor de edad\")\n addVar(_status, 403)\nend()\naddResult(status_msg)",
"test_inputs": {
"edad": "25"
},
"test_list": [
"re.match(r'Usuario mayor de edad', status_msg)",
"re.match(r'200', str(_status))"
],
"_cell": [
"addParam",
"addVar",
"if_mode1"
],
"_prior_weight": 0.9278,
"_quality": {
"fidelity": 1.0,
"bonus_ratio": 0.114,
"test_quality": 1.0,
"richness": 0.333,
"quality": 1.268,
"detected": [
"_status",
"addParam",
"addResult",
"addVar",
"else",
"end",
"if_mode1"
],
"cell": [
"addParam",
"addVar",
"if_mode1"
],
"extra": [
"_status",
"addResult",
"else",
"end"
]
}
},
{
"task_id": 2,
"text": "Crear un microservicio que procese un texto en segundo plano, reemplace caracteres especiales y retorne el resultado procesado",
"code": "addParam(\"texto\", input_text)\naddParam(\"timeout\", max_timeout)\nfunction procesarTexto(data) {\nreplace(data, \"@\", \"[AT]\", cleaned_data)\nreplace(cleaned_data, \"#\", \"[HASH]\", final_data)\nreturn(final_data)\n}\ntask_id = go procesarTexto(input_text)\nresult_data = gather(task_id, max_timeout)\naddResult(result_data)\n_status = 200",
"test_inputs": {
"texto": "usuario@dominio.com #hashtag",
"timeout": 5000
},
"test_list": [
"re.match(r'usuario\\[AT\\]dominio\\.com \\[HASH\\]hashtag', result_data)",
"re.match(r'200', str(_status))"
],
"_cell": [
"gather",
"replace",
"return"
],
"_prior_weight": 0.0848,
"_quality": {
"fidelity": 1.0,
"bonus_ratio": 0.114,
"test_quality": 1.0,
"richness": 0.367,
"quality": 1.271,
"detected": [
"_status",
"addParam",
"addResult",
"function",
"gather",
"replace",
"return"
],
"cell": [
"gather",
"replace",
"return"
],
"extra": [
"_status",
"addParam",
"addResult",
"function"
]
}
}
]

View File

@ -0,0 +1,18 @@
{
"total_cells": 9139,
"filled_cells": 2,
"fill_rate": 0.0002,
"distribution_entropy": 2.585,
"node_type_frequency": {
"addParam": 1,
"if_mode1": 1,
"addVar": 1,
"return": 1,
"replace": 1,
"gather": 1
},
"low_quality_cells": 0,
"empty_cells": 9137,
"kl_divergence_dataset_vs_prior": 0.562,
"prior_summary": "ConstructPrior: 4262 cells | mean=0.252 | epsilon=0.05 | github_files_analyzed=100 github_files_fetched=100 total_pair_cooccurrences=441 total_trio_cooccurrences=3821"
}

Some files were not shown because too many files have changed in this diff Show More