assistance-engine

Commit Graph

Author	SHA1	Message	Date
acano	3aca659b3c	Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev	2026-03-12 13:14:40 +01:00
acano	ed25f15542	feat: Enhance Elasticsearch ingestion process with metadata export - Added output path parameter to elasticsearch_ingestion command for exporting processed documents. - Implemented ElasticHandshakeWithMetadata class to preserve chunk metadata during ingestion. - Updated process_documents function to include extra metadata for each chunk. - Modified ingest_documents function to return Elasticsearch response for each chunk. - Introduced export_documents function to save processed documents as JSON files.	2026-03-12 12:26:47 +01:00
acano	9425db9b6c	Refactor project structure: move prompts module to tasks directory and update references	2026-03-12 10:20:34 +01:00
acano	46a6344c45	Add docstrings to elasticsearch_ingestion and ingest_documents functions for improved documentation	2026-03-12 09:53:56 +01:00
acano	189e404d21	Refactor Elasticsearch ingestion and document processing functions for improved clarity and functionality	2026-03-12 09:50:30 +01:00
rafa-ruiz	90857e1b0a	UPDATE: Modified LRM and generate_mbap.py to ensure better samples	2026-03-11 20:09:05 -07:00
rafa-ruiz	b5167b71e3	UPDATE: Sample generator now includes a new key in each item.	2026-03-11 12:22:08 -07:00
acano	0421a315eb	Set default value of delete_es_index to False in elasticsearch_ingestion function	2026-03-11 17:39:25 +01:00
acano	5f21544e0b	Refactor Elasticsearch ingestion pipeline and add MBPP generation script - Updated `elasticsearch_ingestion.py` to streamline document processing and ingestion into Elasticsearch. - Introduced `generate_mbap.py` for generating benchmark problems in AVAP language from a provided LRM. - Created `prompts.py` to define prompts for converting Python problems to AVAP. - Enhanced chunk processing in `chunk.py` to support markdown and AVAP documents. - Added `OllamaEmbeddings` class in `embeddings.py` for handling embeddings with Ollama model. - Updated dependencies in `uv.lock` to include new packages and versions.	2026-03-11 17:17:44 +01:00
pseco	3ac432567b	BNF extraction pipeline from avap.md	2026-03-11 11:29:19 +01:00
acano	0ed7dfc653	Merge branch 'online' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev	2026-03-11 09:57:14 +01:00
acano	2ad09cc77f	feat: Update dependencies and enhance Elasticsearch ingestion pipeline - Added new dependencies including chonkie and markdown-it-py to requirements.txt. - Refactored the Elasticsearch ingestion script to read and concatenate documents from specified folders. - Implemented semantic chunking for documents using the chonkie library. - Removed the old elasticsearch_ingestion_from_docs.py script as its functionality has been integrated into the main ingestion pipeline. - Updated README.md to reflect new project structure and environment variables. - Added a new changelog entry for version 1.4.0 detailing recent changes and enhancements.	2026-03-11 09:50:51 +01:00
rafa-ruiz	35ca56118d	feat: add MBPP-style dataset generator and evaluation docs	2026-03-10 13:37:19 -07:00
acano	745ce07805	Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev	2026-03-10 14:36:17 +01:00
acano	bf3c7f36d8	feat(chunk): enhance file reading and processing logic - Updated `read_files` function to return a list of dictionaries containing 'content' and 'title' keys. - Added logic to handle concatenation of file contents and improved handling of file prefixes. - Introduced `get_chunk_docs` function to chunk document contents using `SemanticChunker`. - Added `convert_chunks_to_document` function to convert chunked content into `Document` objects. - Integrated logging for chunking process. - Updated dependencies in `uv.lock` to include `chonkie` and other related packages.	2026-03-10 14:36:09 +01:00
pseco	a9bf84fa79	feat: Add synthetic dataset generation for AVAP using MBPP dataset - Implemented a new script `translate_mbpp.py` to generate synthetic datasets using various LLM providers. - Integrated the `get_prompt_mbpp` function in `prompts.py` to create prompts tailored for AVAP language conversion.	2026-03-09 17:43:07 +01:00
pseco	f6bfba5561	Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev	2026-03-09 15:04:23 +01:00
pseco	4afba7d89d	working on scrappy	2026-03-09 15:00:07 +01:00
acano	6d856ba691	Add chunk.py for processing and replacing JavaScript references with Avap - Implemented `replace_javascript_with_avap` function to handle text replacement. - Created `read_concat_files` function to read and concatenate files with a specified prefix, replacing JavaScript markers. - Added functionality to read files from a specified directory and process their contents.	2026-03-09 13:21:18 +01:00
acano	a4267e1b60	feat: implement Elasticsearch ingestion pipeline and embedding factories	2026-03-05 16:26:22 +01:00
acano	d951868200	refactor: Simplify Elasticsearch ingestion by removing chunk management module and integrating document building directly	2026-03-05 16:23:27 +01:00
acano	51f42c52b3	refactor: Remove unused uuid import from chunks.py and update changelog for refactoring changes	2026-03-05 11:27:27 +01:00
acano	1549069f5a	feat: Add Elasticsearch ingestion pipeline and document chunking functionality - Implemented `elasticsearch_ingestion` function to handle document ingestion into Elasticsearch. - Created `build_chunks_from_folder` function to read and clean text files, generating document chunks. - Added logging for better traceability during the ingestion process. - Updated `uv.lock` to include `boto3` as a new dependency.	2026-03-04 18:21:01 +01:00

23 Commits