Commit Graph

23 Commits

Author SHA1 Message Date
acano 5f21544e0b Refactor Elasticsearch ingestion pipeline and add MBPP generation script
- Updated `elasticsearch_ingestion.py` to streamline document processing and ingestion into Elasticsearch.
- Introduced `generate_mbap.py` for generating benchmark problems in AVAP language from a provided LRM.
- Created `prompts.py` to define prompts for converting Python problems to AVAP.
- Enhanced chunk processing in `chunk.py` to support markdown and AVAP documents.
- Added `OllamaEmbeddings` class in `embeddings.py` for handling embeddings with Ollama model.
- Updated dependencies in `uv.lock` to include new packages and versions.
2026-03-11 17:17:44 +01:00
acano bf3c7f36d8 feat(chunk): enhance file reading and processing logic
- Updated `read_files` function to return a list of dictionaries containing 'content' and 'title' keys.
- Added logic to handle concatenation of file contents and improved handling of file prefixes.
- Introduced `get_chunk_docs` function to chunk document contents using `SemanticChunker`.
- Added `convert_chunks_to_document` function to convert chunked content into `Document` objects.
- Integrated logging for chunking process.
- Updated dependencies in `uv.lock` to include `chonkie` and other related packages.
2026-03-10 14:36:09 +01:00
izapata c8da317dd8 chore: remove unused dependencies from pyproject.toml and uv.lock 2026-03-09 11:08:42 +01:00
pseco 11e6ef71b1 working on agent pseco scratches 2026-03-09 09:35:49 +01:00
acano 1549069f5a feat: Add Elasticsearch ingestion pipeline and document chunking functionality
- Implemented `elasticsearch_ingestion` function to handle document ingestion into Elasticsearch.
- Created `build_chunks_from_folder` function to read and clean text files, generating document chunks.
- Added logging for better traceability during the ingestion process.
- Updated `uv.lock` to include `boto3` as a new dependency.
2026-03-04 18:21:01 +01:00
acano a7c40d4f2c Refactor Docker Compose and Update Dependencies
- Removed network_mode: "host" from docker-compose.yaml for better isolation.
- Updated execution counts in langgraph_agent_simple.ipynb to reflect new cell order.
- Added OLLAMA_LOCAL_URL to imports in langgraph_agent_simple.ipynb.
- Included base_url parameter for create_chat_model and create_embedding_model functions in langgraph_agent_simple.ipynb.
- Added litellm>=1.82.0 to the development dependencies in pyproject.toml.
- Updated uv.lock to include litellm and its dependencies, along with fastuuid package.
2026-03-02 16:13:07 +01:00
acano 5a666079a4 Refactor langgraph_agent_simple notebook execution counts and handle Langfuse client errors
- Set execution counts to null for initial cells in langgraph_agent_simple.ipynb
- Update execution counts for subsequent cells to maintain order
- Change output stream name from stdout to stderr for error handling
- Capture and log detailed error messages for failed Langfuse client authentication

Update uv.lock to manage accelerate dependency

- Remove accelerate from main dependencies
- Add accelerate to dev dependencies with version specification
- Adjust requires-dist section to reflect changes in dependency management
2026-03-02 14:07:29 +01:00
acano cdc90c0b43 Refactor code structure for improved readability and maintainability 2026-03-02 12:40:49 +01:00
acano 48d280440c Refactor code structure for improved readability and maintainability 2026-02-27 14:45:33 +01:00
pseco 6480c77edb added markdown, langfuse downgraded, angchain-huggingface, accelerate 2026-02-27 08:36:20 +01:00
pseco 4a2db004c0 adding ragas to dev 2026-02-26 09:45:42 +01:00
pseco f6a907911d unstage changes 2026-02-24 16:59:57 +01:00
acano 0d6c08e341 Add BEIR analysis notebook for CosQA and update dependencies
- Created a new Jupyter notebook for analyzing BEIR dataset with CosQA using Ollama embeddings.
- Implemented a custom embedding class to integrate LangChain's OllamaEmbeddings with BEIR.
- Added data loading and evaluation logic for the CosQA dataset.
- Updated `uv.lock` to remove unnecessary dependencies (`mteb` and `polars`) and incremented revision number.
2026-02-24 15:27:59 +01:00
pseco 9b6726c232 working on evaluatin embeddings 2026-02-24 14:35:48 +01:00
acano cb16306ffb Refactor code structure for improved readability and maintainability 2026-02-24 11:46:18 +01:00
acano 4a1236f951 Refactor code structure for improved readability and maintainability 2026-02-24 10:49:52 +01:00
acano 02af67fffb created notebook for langgraph testing 2026-02-19 14:48:46 +01:00
acano 0dad6b1ef5 feat: add initial implementation of Elasticsearch ingestion with chunking strategies 2026-02-18 13:52:54 +01:00
acano 918befe65f Refactor code structure for improved readability and maintainability 2026-02-17 11:33:15 +01:00
pseco 36bd3b32a6 generate working schema 2026-02-16 17:58:18 +01:00
izapata b0e3c0d482 Added langchain 2026-02-12 17:58:37 +01:00
pseco e1ebecfce9 add grpc 2026-02-12 12:22:24 +01:00
acano 04d423d87e chore: clean up code structure and remove unused code blocks 2026-02-11 18:03:43 +01:00