Commit Graph

12 Commits

Author SHA1 Message Date
acano bf3c7f36d8 feat(chunk): enhance file reading and processing logic
- Updated `read_files` function to return a list of dictionaries containing 'content' and 'title' keys.
- Added logic to handle concatenation of file contents and improved handling of file prefixes.
- Introduced `get_chunk_docs` function to chunk document contents using `SemanticChunker`.
- Added `convert_chunks_to_document` function to convert chunked content into `Document` objects.
- Integrated logging for chunking process.
- Updated dependencies in `uv.lock` to include `chonkie` and other related packages.
2026-03-10 14:36:09 +01:00
pseco f6bfba5561 Merge branch 'mrh-online-dev' of github.com:BRUNIX-AI/assistance-engine into mrh-online-dev 2026-03-09 15:04:23 +01:00
pseco 4afba7d89d working on scrappy 2026-03-09 15:00:07 +01:00
acano 6d856ba691 Add chunk.py for processing and replacing JavaScript references with Avap
- Implemented `replace_javascript_with_avap` function to handle text replacement.
- Created `read_concat_files` function to read and concatenate files with a specified prefix, replacing JavaScript markers.
- Added functionality to read files from a specified directory and process their contents.
2026-03-09 13:21:18 +01:00
acano d951868200 refactor: Simplify Elasticsearch ingestion by removing chunk management module and integrating document building directly 2026-03-05 16:23:27 +01:00
acano 51f42c52b3 refactor: Remove unused uuid import from chunks.py and update changelog for refactoring changes 2026-03-05 11:27:27 +01:00
acano 1549069f5a feat: Add Elasticsearch ingestion pipeline and document chunking functionality
- Implemented `elasticsearch_ingestion` function to handle document ingestion into Elasticsearch.
- Created `build_chunks_from_folder` function to read and clean text files, generating document chunks.
- Added logging for better traceability during the ingestion process.
- Updated `uv.lock` to include `boto3` as a new dependency.
2026-03-04 18:21:01 +01:00
pseco 9575af3ff0 working on dual index 2026-03-03 12:01:03 +01:00
pseco a5952c1a4d working on agent in docker 2026-03-02 12:41:27 +01:00
pseco 36bd3b32a6 generate working schema 2026-02-16 17:58:18 +01:00
izapata 7cdaf5a0c5 feat: update README and add start-tunnels.sh script for infrastructure setup 2026-02-16 14:50:55 +01:00
acano 03116be719 chore: add .gitkeep files to notebooks and scripts directories 2026-02-11 18:06:16 +01:00