Commit Graph

7 Commits

Author SHA1 Message Date
pseco a9bf84fa79 feat: Add synthetic dataset generation for AVAP using MBPP dataset
- Implemented a new script `translate_mbpp.py` to generate synthetic datasets using various LLM providers.
- Integrated the `get_prompt_mbpp` function in `prompts.py` to create prompts tailored for AVAP language conversion.
2026-03-09 17:43:07 +01:00
pseco f6bfba5561 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-09 15:04:23 +01:00
pseco 4afba7d89d working on scrappy 2026-03-09 15:00:07 +01:00
acano 6d856ba691 Add chunk.py for processing and replacing JavaScript references with Avap
- Implemented `replace_javascript_with_avap` function to handle text replacement.
- Created `read_concat_files` function to read and concatenate files with a specified prefix, replacing JavaScript markers.
- Added functionality to read files from a specified directory and process their contents.
2026-03-09 13:21:18 +01:00
acano d951868200 refactor: Simplify Elasticsearch ingestion by removing chunk management module and integrating document building directly 2026-03-05 16:23:27 +01:00
acano 51f42c52b3 refactor: Remove unused uuid import from chunks.py and update changelog for refactoring changes 2026-03-05 11:27:27 +01:00
acano 1549069f5a feat: Add Elasticsearch ingestion pipeline and document chunking functionality
- Implemented `elasticsearch_ingestion` function to handle document ingestion into Elasticsearch.
- Created `build_chunks_from_folder` function to read and clean text files, generating document chunks.
- Added logging for better traceability during the ingestion process.
- Updated `uv.lock` to include `boto3` as a new dependency.
2026-03-04 18:21:01 +01:00