Commit Graph

58 Commits

Author SHA1 Message Date
pseco 26ffcc54d9 Refactor code structure for improved readability and maintainability 2026-03-31 13:57:15 +02:00
pseco 8f501d3e52 Refactor code structure for improved readability and maintainability 2026-03-31 11:16:03 +02:00
pseco cd656b08a8 Update default dataset path in validate_synthetic_dataset.py to point to new output location 2026-03-30 10:04:28 +02:00
acano e4f76f3fab Add newline at the end of generate_mbap_v2.py for better file formatting 2026-03-27 14:10:58 +01:00
acano f747c140c8 Enhance generate_mbap_v2.py with new reward mechanism and GoldPool integration
- Added GoldPool class to manage a top-K pool of high-reward examples.
- Implemented compute_reward function to calculate composite rewards based on execution coverage, novelty, and test quality.
- Introduced call_api_reward function for API calls in the new reward mode.
- Updated main function to support new reward mode with adjustable weights for ECS, novelty, and test quality.
- Enhanced dataset saving functionality to include reward statistics.
- Refactored existing code for improved readability and consistency.
2026-03-27 14:04:21 +01:00
acano c6b57849cd Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-26 17:02:27 +01:00
rafa-ruiz fe43cd6fa9 scripts documentation 2026-03-26 07:51:01 -07:00
acano d50f33c707 Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-26 09:37:57 +01:00
rafa-ruiz ccd9073a52 feat(dataset): add ADR-0006 and scaffold reward algorithm pipeline 2026-03-25 22:19:19 -07:00
acano fe90548b8b added ast tree metadata 2026-03-25 10:36:18 +01:00
acano ec57e52dea Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-23 09:12:20 +01:00
rafa-ruiz 2fbfad41df feat: editor context injection (PRD-0002) + repository governance 2026-03-20 19:25:29 -07:00
acano 752bf9c7d9 Update Elasticsearch index version and modify imports in ingestion and translation scripts
- Changed Elasticsearch index from "avap-docs-test-v3" to "avap-docs-test-v4" in elasticsearch_ingestion.py.
- Removed unused import SystemMessage from langchain_core.messages in translate_mbpp.py.
- Added import for Lark in chunk.py to support new functionality.
2026-03-19 11:30:00 +01:00
acano 868a17523a Merge online into mrh-online-dev 2026-03-19 11:25:36 +01:00
rafa-ruiz fda47edae0 UPGRADE: New RAG functional 2026-03-18 18:56:01 -07:00
pseco 8878ca51e4 working on examples verification and testing on avap language server 2026-03-17 11:46:25 +01:00
pseco 80cdbcc38e Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-17 11:02:18 +01:00
pseco c7adab24a6 working on synthetic dataset 2026-03-17 11:02:06 +01:00
acano fadf813494 Update Elasticsearch index version and enhance document processing
- Changed Elasticsearch index from "avap-docs-test-v3" to "avap-docs-test-v4" in elasticsearch_ingestion.py.
- Added Lark parser for AVAP code processing in chunk.py.
- Enhanced metadata extraction for processed documents, including AST for AVAP files.
- Improved error handling for AVAP code parsing.
2026-03-16 13:21:25 +01:00
acano ab1022d8b6 feat: Implement ElasticHandshakeWithMetadata to preserve chunk metadata in Elasticsearch 2026-03-13 11:02:32 +01:00
acano 9a435120d5 Merge branch 'online' into mrh-online-dev-partial 2026-03-12 17:09:00 +01:00
pseco acc00adfaa Add AVAP execution and testing scripts
- Implemented parser for executing AVAP files within a Docker container (parser v1.py).
- Created a script to send AVAP code to a local server and handle responses (parser v2.py).
- Introduced a mock MBAP test harness to validate AVAP code against expected outputs (mbap_tester.py).
- Added transformation logic to convert AVAP code into Python-like syntax for testing purposes.
- Enhanced error handling and output formatting in the testing harness.
2026-03-12 15:56:36 +01:00
acano 0abbae93a4 docs: update usage instructions and improve validation error messages in generate_mbap.py 2026-03-12 13:19:10 +01:00
acano 3aca659b3c Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-12 13:14:40 +01:00
acano 01ce959aab refactor: remove unused BNF file generator script 2026-03-12 12:31:05 +01:00
acano 654ac88da7 feat: Enhance Elasticsearch ingestion with metadata export
- Added `export_documents` function to save processed documents to JSON.
- Extended `ElasticHandshake` to include chunk metadata during ingestion.
- Updated `process_documents` to include extra metadata for each chunk.
- Modified `ingest_documents` to return Elasticsearch responses for further processing.
- Adjusted `elasticsearch_ingestion` command to accept output path for exported JSON.
2026-03-12 12:28:17 +01:00
acano ed25f15542 feat: Enhance Elasticsearch ingestion process with metadata export
- Added output path parameter to elasticsearch_ingestion command for exporting processed documents.
- Implemented ElasticHandshakeWithMetadata class to preserve chunk metadata during ingestion.
- Updated process_documents function to include extra metadata for each chunk.
- Modified ingest_documents function to return Elasticsearch response for each chunk.
- Introduced export_documents function to save processed documents as JSON files.
2026-03-12 12:26:47 +01:00
acano 648f0f7318 refactor: reorganize file structure and update import paths for clarity 2026-03-12 10:21:44 +01:00
acano 9425db9b6c Refactor project structure: move prompts module to tasks directory and update references 2026-03-12 10:20:34 +01:00
acano a4478cb7ff refactor: remove unused BNF file generator script 2026-03-12 10:09:25 +01:00
acano dc7568b622 docs: enhance function docstrings for Elasticsearch ingestion and document processing 2026-03-12 09:54:24 +01:00
acano 46a6344c45 Add docstrings to elasticsearch_ingestion and ingest_documents functions for improved documentation 2026-03-12 09:53:56 +01:00
acano aa80f60fdc refactor: update Elasticsearch ingestion pipeline and document processing logic 2026-03-12 09:51:00 +01:00
acano 189e404d21 Refactor Elasticsearch ingestion and document processing functions for improved clarity and functionality 2026-03-12 09:50:30 +01:00
rafa-ruiz 90857e1b0a UPDATE: Modified LRM and generate_mbap.py to ensure better samples 2026-03-11 20:09:05 -07:00
rafa-ruiz b5167b71e3 UPDATE: Sample generator now includes a new key in each item. 2026-03-11 12:22:08 -07:00
acano de21bcb5fb Refactor code structure for improved readability and maintainability 2026-03-11 17:48:54 +01:00
acano 0421a315eb Set default value of delete_es_index to False in elasticsearch_ingestion function 2026-03-11 17:39:25 +01:00
acano 5f21544e0b Refactor Elasticsearch ingestion pipeline and add MBPP generation script
- Updated `elasticsearch_ingestion.py` to streamline document processing and ingestion into Elasticsearch.
- Introduced `generate_mbap.py` for generating benchmark problems in AVAP language from a provided LRM.
- Created `prompts.py` to define prompts for converting Python problems to AVAP.
- Enhanced chunk processing in `chunk.py` to support markdown and AVAP documents.
- Added `OllamaEmbeddings` class in `embeddings.py` for handling embeddings with Ollama model.
- Updated dependencies in `uv.lock` to include new packages and versions.
2026-03-11 17:17:44 +01:00
pseco 3ac432567b BNF extraction pipeline from avap.md 2026-03-11 11:29:19 +01:00
acano 0ed7dfc653 Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-11 09:57:14 +01:00
acano 2ad09cc77f feat: Update dependencies and enhance Elasticsearch ingestion pipeline
- Added new dependencies including chonkie and markdown-it-py to requirements.txt.
- Refactored the Elasticsearch ingestion script to read and concatenate documents from specified folders.
- Implemented semantic chunking for documents using the chonkie library.
- Removed the old elasticsearch_ingestion_from_docs.py script as its functionality has been integrated into the main ingestion pipeline.
- Updated README.md to reflect new project structure and environment variables.
- Added a new changelog entry for version 1.4.0 detailing recent changes and enhancements.
2026-03-11 09:50:51 +01:00
rafa-ruiz 35ca56118d feat: add MBPP-style dataset generator and evaluation docs 2026-03-10 13:37:19 -07:00
acano 745ce07805 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-10 14:36:17 +01:00
acano bf3c7f36d8 feat(chunk): enhance file reading and processing logic
- Updated `read_files` function to return a list of dictionaries containing 'content' and 'title' keys.
- Added logic to handle concatenation of file contents and improved handling of file prefixes.
- Introduced `get_chunk_docs` function to chunk document contents using `SemanticChunker`.
- Added `convert_chunks_to_document` function to convert chunked content into `Document` objects.
- Integrated logging for chunking process.
- Updated dependencies in `uv.lock` to include `chonkie` and other related packages.
2026-03-10 14:36:09 +01:00
pseco a9bf84fa79 feat: Add synthetic dataset generation for AVAP using MBPP dataset
- Implemented a new script `translate_mbpp.py` to generate synthetic datasets using various LLM providers.
- Integrated the `get_prompt_mbpp` function in `prompts.py` to create prompts tailored for AVAP language conversion.
2026-03-09 17:43:07 +01:00
pseco f6bfba5561 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-09 15:04:23 +01:00
pseco 4afba7d89d working on scrappy 2026-03-09 15:00:07 +01:00
acano 6d856ba691 Add chunk.py for processing and replacing JavaScript references with Avap
- Implemented `replace_javascript_with_avap` function to handle text replacement.
- Created `read_concat_files` function to read and concatenate files with a specified prefix, replacing JavaScript markers.
- Added functionality to read files from a specified directory and process their contents.
2026-03-09 13:21:18 +01:00
acano a4267e1b60 feat: implement Elasticsearch ingestion pipeline and embedding factories 2026-03-05 16:26:22 +01:00