Commit Graph

2 Commits

Author SHA1 Message Date
acano 5f21544e0b Refactor Elasticsearch ingestion pipeline and add MBPP generation script
- Updated `elasticsearch_ingestion.py` to streamline document processing and ingestion into Elasticsearch.
- Introduced `generate_mbap.py` for generating benchmark problems in AVAP language from a provided LRM.
- Created `prompts.py` to define prompts for converting Python problems to AVAP.
- Enhanced chunk processing in `chunk.py` to support markdown and AVAP documents.
- Added `OllamaEmbeddings` class in `embeddings.py` for handling embeddings with Ollama model.
- Updated dependencies in `uv.lock` to include new packages and versions.
2026-03-11 17:17:44 +01:00
acano 2ad09cc77f feat: Update dependencies and enhance Elasticsearch ingestion pipeline
- Added new dependencies including chonkie and markdown-it-py to requirements.txt.
- Refactored the Elasticsearch ingestion script to read and concatenate documents from specified folders.
- Implemented semantic chunking for documents using the chonkie library.
- Removed the old elasticsearch_ingestion_from_docs.py script as its functionality has been integrated into the main ingestion pipeline.
- Updated README.md to reflect new project structure and environment variables.
- Added a new changelog entry for version 1.4.0 detailing recent changes and enhancements.
2026-03-11 09:50:51 +01:00