Commit Graph

13 Commits

Author SHA1 Message Date
acano 868a17523a Merge online into mrh-online-dev 2026-03-19 11:25:36 +01:00
acano fadf813494 Update Elasticsearch index version and enhance document processing
- Changed Elasticsearch index from "avap-docs-test-v3" to "avap-docs-test-v4" in elasticsearch_ingestion.py.
- Added Lark parser for AVAP code processing in chunk.py.
- Enhanced metadata extraction for processed documents, including AST for AVAP files.
- Improved error handling for AVAP code parsing.
2026-03-16 13:21:25 +01:00
acano ab1022d8b6 feat: Implement ElasticHandshakeWithMetadata to preserve chunk metadata in Elasticsearch 2026-03-13 11:02:32 +01:00
acano 654ac88da7 feat: Enhance Elasticsearch ingestion with metadata export
- Added `export_documents` function to save processed documents to JSON.
- Extended `ElasticHandshake` to include chunk metadata during ingestion.
- Updated `process_documents` to include extra metadata for each chunk.
- Modified `ingest_documents` to return Elasticsearch responses for further processing.
- Adjusted `elasticsearch_ingestion` command to accept output path for exported JSON.
2026-03-12 12:28:17 +01:00
acano ed25f15542 feat: Enhance Elasticsearch ingestion process with metadata export
- Added output path parameter to elasticsearch_ingestion command for exporting processed documents.
- Implemented ElasticHandshakeWithMetadata class to preserve chunk metadata during ingestion.
- Updated process_documents function to include extra metadata for each chunk.
- Modified ingest_documents function to return Elasticsearch response for each chunk.
- Introduced export_documents function to save processed documents as JSON files.
2026-03-12 12:26:47 +01:00
acano dc7568b622 docs: enhance function docstrings for Elasticsearch ingestion and document processing 2026-03-12 09:54:24 +01:00
acano 46a6344c45 Add docstrings to elasticsearch_ingestion and ingest_documents functions for improved documentation 2026-03-12 09:53:56 +01:00
acano aa80f60fdc refactor: update Elasticsearch ingestion pipeline and document processing logic 2026-03-12 09:51:00 +01:00
acano 189e404d21 Refactor Elasticsearch ingestion and document processing functions for improved clarity and functionality 2026-03-12 09:50:30 +01:00
acano de21bcb5fb Refactor code structure for improved readability and maintainability 2026-03-11 17:48:54 +01:00
acano 5f21544e0b Refactor Elasticsearch ingestion pipeline and add MBPP generation script
- Updated `elasticsearch_ingestion.py` to streamline document processing and ingestion into Elasticsearch.
- Introduced `generate_mbap.py` for generating benchmark problems in AVAP language from a provided LRM.
- Created `prompts.py` to define prompts for converting Python problems to AVAP.
- Enhanced chunk processing in `chunk.py` to support markdown and AVAP documents.
- Added `OllamaEmbeddings` class in `embeddings.py` for handling embeddings with Ollama model.
- Updated dependencies in `uv.lock` to include new packages and versions.
2026-03-11 17:17:44 +01:00
acano bf3c7f36d8 feat(chunk): enhance file reading and processing logic
- Updated `read_files` function to return a list of dictionaries containing 'content' and 'title' keys.
- Added logic to handle concatenation of file contents and improved handling of file prefixes.
- Introduced `get_chunk_docs` function to chunk document contents using `SemanticChunker`.
- Added `convert_chunks_to_document` function to convert chunked content into `Document` objects.
- Integrated logging for chunking process.
- Updated dependencies in `uv.lock` to include `chonkie` and other related packages.
2026-03-10 14:36:09 +01:00
acano 6d856ba691 Add chunk.py for processing and replacing JavaScript references with Avap
- Implemented `replace_javascript_with_avap` function to handle text replacement.
- Created `read_concat_files` function to read and concatenate files with a specified prefix, replacing JavaScript markers.
- Added functionality to read files from a specified directory and process their contents.
2026-03-09 13:21:18 +01:00