Merge online into mrh-online-dev

This commit is contained in:
acano 2026-03-19 11:25:36 +01:00
commit 868a17523a
71 changed files with 6246 additions and 3581 deletions

View File

@ -14,7 +14,8 @@
6. [Environment Variables Policy](#6-environment-variables-policy)
7. [Changelog Policy](#7-changelog-policy)
8. [Documentation Policy](#8-documentation-policy)
9. [Incident & Blockage Reporting](#9-incident--blockage-reporting)
9. [Architecture Decision Records (ADRs)](#9-architecture-decision-records-adrs)
10. [Incident & Blockage Reporting](#10-incident--blockage-reporting)
---
@ -95,9 +96,10 @@ A PR is not ready for review unless **all applicable items** in the following ch
- [ ] No changelog entry required (internal refactor, comment/typo fix, zero behavioral change)
- [ ] Changelog updated with correct version bump and date
**Documentation** *(see [Section 7](#7-documentation-policy))*
**Documentation** *(see [Section 8](#8-documentation-policy))*
- [ ] No documentation update required (internal change, no impact on setup or API)
- [ ] `README.md` or relevant docs updated to reflect this change
- [ ] If a significant architectural decision was made, an ADR was created in `docs/adr/`
---
@ -206,11 +208,87 @@ Update `README.md` (or the relevant doc file) if the PR includes any of the foll
- Internal implementation changes with no impact on setup, usage, or API
- Fixes that do not alter any documented behavior
### Documentation files in this repository
| File | Purpose |
|---|---|
| `README.md` | Setup guide, env vars reference, quick start |
| `CONTRIBUTING.md` | Contribution standards (this file) |
| `SECURITY.md` | Security policy and vulnerability reporting |
| `docs/ARCHITECTURE.md` | Deep technical architecture reference |
| `docs/API_REFERENCE.md` | Complete gRPC API contract and examples |
| `docs/RUNBOOK.md` | Operational playbooks and incident response |
| `docs/AVAP_CHUNKER_CONFIG.md` | `avap_config.json` reference — blocks, statements, semantic tags |
| `docs/adr/` | Architecture Decision Records |
> **PRs that change user-facing behavior or setup without updating documentation will be rejected.**
---
## 9. Incident & Blockage Reporting
## 9. Architecture Decision Records (ADRs)
Architecture Decision Records document **significant technical decisions** — choices that have lasting consequences on the codebase, infrastructure, or development process.
### When to write an ADR
Write an ADR when a PR introduces or changes:
- A fundamental technology choice (communication protocol, storage backend, framework)
- A design pattern that other components will follow
- A deliberate trade-off with known consequences
- A decision that future engineers might otherwise reverse without understanding the rationale
### When NOT to write an ADR
- Implementation details within a single module
- Bug fixes
- Dependency version bumps
- Configuration changes
### ADR format
ADRs live in `docs/adr/` and follow this naming convention:
```
ADR-XXXX-short-title.md
```
Where `XXXX` is a zero-padded sequential number (e.g., `ADR-0005-new-decision.md`).
Each ADR must contain:
```markdown
# ADR-XXXX: Title
**Date:** YYYY-MM-DD
**Status:** Proposed | Accepted | Deprecated | Superseded by ADR-YYYY
**Deciders:** Names or roles
## Context
What problem are we solving? What forces are at play?
## Decision
What did we decide?
## Rationale
Why this option over alternatives? Include a trade-off analysis.
## Consequences
What are the positive and negative results of this decision?
```
### Existing ADRs
| ADR | Title | Status |
|---|---|---|
| [ADR-0001](docs/adr/ADR-0001-grpc-primary-interface.md) | gRPC as the Primary Communication Interface | Accepted |
| [ADR-0002](docs/adr/ADR-0002-two-phase-streaming.md) | Two-Phase Streaming Design for AskAgentStream | Accepted |
| [ADR-0003](docs/adr/ADR-0003-hybrid-retrieval-rrf.md) | Hybrid Retrieval (BM25 + kNN) with RRF Fusion | Accepted |
| [ADR-0004](docs/adr/ADR-0004-claude-eval-judge.md) | Claude as the RAGAS Evaluation Judge | Accepted |
---
## 10. Incident & Blockage Reporting
If you encounter a technical blockage (connection timeouts, service downtime, tunnel failures):
@ -221,6 +299,8 @@ If you encounter a technical blockage (connection timeouts, service downtime, tu
- Current status of all `kubectl` tunnels
3. **Resolution** — If the error is not reproducible by the CTO/DevOps team, a 5-minute live debugging session will be scheduled to identify local network or configuration issues.
See [`docs/RUNBOOK.md`](docs/RUNBOOK.md) for full incident playbooks and escalation paths.
---
*These standards exist to protect the integrity of the Brunix Assistance Engine and to ensure every member of the team can work confidently and efficiently. They are not bureaucratic overhead — they are the foundation of a reliable, scalable engineering practice.*

View File

@ -10,7 +10,7 @@ COPY ./requirements.txt .
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
curl \
protobuf-compiler \
protobuf-compiler \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir --upgrade pip
@ -25,6 +25,10 @@ RUN python -m grpc_tools.protoc \
--grpc_python_out=./src \
./protos/brunix.proto
EXPOSE 50051
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
CMD ["python", "src/server.py"]
EXPOSE 50051
EXPOSE 8000
ENTRYPOINT ["/entrypoint.sh"]

View File

@ -6,6 +6,7 @@ services:
container_name: brunix-assistance-engine
ports:
- "50052:50051"
- "8000:8000"
environment:
ELASTICSEARCH_URL: ${ELASTICSEARCH_URL}
ELASTICSEARCH_INDEX: ${ELASTICSEARCH_INDEX}
@ -16,6 +17,7 @@ services:
OLLAMA_URL: ${OLLAMA_URL}
OLLAMA_MODEL_NAME: ${OLLAMA_MODEL_NAME}
OLLAMA_EMB_MODEL_NAME: ${OLLAMA_EMB_MODEL_NAME}
PROXY_THREAD_WORKERS: 10
extra_hosts:
- "host.docker.internal:host-gateway"

30
Docker/entrypoint.sh Normal file
View File

@ -0,0 +1,30 @@
#!/bin/sh
set -e
echo "[entrypoint] Starting Brunix Engine (gRPC :50051)..."
python src/server.py &
ENGINE_PID=$!
echo "[entrypoint] Starting OpenAI Proxy (HTTP :8000)..."
uvicorn openai_proxy:app --host 0.0.0.0 --port 8000 --workers 4 --app-dir src &
PROXY_PID=$!
wait_any() {
while kill -0 $ENGINE_PID 2>/dev/null && kill -0 $PROXY_PID 2>/dev/null; do
sleep 2
done
if ! kill -0 $ENGINE_PID 2>/dev/null; then
echo "[entrypoint] Engine died — stopping proxy"
kill $PROXY_PID 2>/dev/null
exit 1
fi
if ! kill -0 $PROXY_PID 2>/dev/null; then
echo "[entrypoint] Proxy died — stopping engine"
kill $ENGINE_PID 2>/dev/null
exit 1
fi
}
wait_any

View File

@ -3,16 +3,60 @@ syntax = "proto3";
package brunix;
service AssistanceEngine {
rpc AskAgent (AgentRequest) returns (stream AgentResponse);
// Respuesta completa compatible con clientes existentes
rpc AskAgent (AgentRequest) returns (stream AgentResponse);
// Streaming real token a token desde Ollama
rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
// Evaluación RAGAS con Claude como juez
rpc EvaluateRAG (EvalRequest) returns (EvalResponse);
}
// ---------------------------------------------------------------------------
// AskAgent / AskAgentStream mismos mensajes, dos comportamientos
// ---------------------------------------------------------------------------
message AgentRequest {
string query = 1;
string session_id = 2;
string query = 1;
string session_id = 2;
}
message AgentResponse {
string text = 1;
string text = 1;
string avap_code = 2;
bool is_final = 3;
bool is_final = 3;
}
// ---------------------------------------------------------------------------
// EvaluateRAG
// ---------------------------------------------------------------------------
message EvalRequest {
string category = 1;
int32 limit = 2;
string index = 3;
}
message EvalResponse {
string status = 1;
int32 questions_evaluated = 2;
float elapsed_seconds = 3;
string judge_model = 4;
string index = 5;
float faithfulness = 6;
float answer_relevancy = 7;
float context_recall = 8;
float context_precision = 9;
float global_score = 10;
string verdict = 11;
repeated QuestionDetail details = 12;
}
message QuestionDetail {
string id = 1;
string category = 2;
string question = 3;
string answer_preview = 4;
int32 n_chunks = 5;
}

View File

@ -316,3 +316,10 @@ yarl==1.22.0
# via aiohttp
zstandard==0.25.0
# via langsmith
ragas
datasets
langchain-anthropic
fastapi>=0.111.0
uvicorn[standard]>=0.29.0

230
Docker/src/evaluate.py Normal file
View File

@ -0,0 +1,230 @@
import os
import time
import json
import logging
from collections import defaultdict
from pathlib import Path
from typing import Optional
from ragas import evaluate as ragas_evaluate
from ragas.metrics import ( faithfulness, answer_relevancy, context_recall, context_precision,)
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from datasets import Dataset
from langchain_anthropic import ChatAnthropic
logger = logging.getLogger(__name__)
GOLDEN_DATASET_PATH = Path(__file__).parent / "golden_dataset.json"
CLAUDE_MODEL = os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-20250514")
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY", "")
K_RETRIEVE = 5
ANTHROPIC_AVAILABLE = True
from elasticsearch import Elasticsearch
from langchain_core.messages import SystemMessage, HumanMessage
def retrieve_context( es_client, embeddings, question, index, k = K_RETRIEVE,):
query_vector = None
try:
query_vector = embeddings.embed_query(question)
except Exception as e:
logger.warning(f"[eval] embed_query fails: {e}")
bm25_hits = []
try:
resp = es_client.search(
index=index,
body={
"size": k,
"query": {
"multi_match": {
"query": question,
"fields": ["content^2", "text^2"],
"type": "best_fields",
"fuzziness": "AUTO",
}
},
"_source": {"excludes": ["embedding"]},
}
)
bm25_hits = resp["hits"]["hits"]
except Exception as e:
logger.warning(f"[eval] BM25 fails: {e}")
knn_hits = []
if query_vector:
try:
resp = es_client.search(
index=index,
body={
"size": k,
"knn": {
"field": "embedding",
"query_vector": query_vector,
"k": k,
"num_candidates": k * 5,
},
"_source": {"excludes": ["embedding"]},
}
)
knn_hits = resp["hits"]["hits"]
except Exception as e:
logger.warning(f"[eval] kNN falló: {e}")
rrf_scores: dict[str, float] = defaultdict(float)
hit_by_id: dict[str, dict] = {}
for rank, hit in enumerate(bm25_hits):
doc_id = hit["_id"]
rrf_scores[doc_id] += 1.0 / (rank + 60)
hit_by_id[doc_id] = hit
for rank, hit in enumerate(knn_hits):
doc_id = hit["_id"]
rrf_scores[doc_id] += 1.0 / (rank + 60)
if doc_id not in hit_by_id:
hit_by_id[doc_id] = hit
ranked = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)[:k]
return [
hit_by_id[doc_id]["_source"].get("content")
or hit_by_id[doc_id]["_source"].get("text", "")
for doc_id, _ in ranked
if (
hit_by_id[doc_id]["_source"].get("content")
or hit_by_id[doc_id]["_source"].get("text", "")
).strip()
]
def generate_answer(llm, question: str, contexts: list[str]) -> str:
try:
from prompts import GENERATE_PROMPT
context_text = "\n\n".join(
f"[{i+1}] {ctx}" for i, ctx in enumerate(contexts)
)
prompt = SystemMessage(
content=GENERATE_PROMPT.content.format(context=context_text)
)
resp = llm.invoke([prompt, HumanMessage(content=question)])
return resp.content.strip()
except Exception as e:
logger.warning(f"[eval] generate_answer fails: {e}")
return ""
def run_evaluation( es_client, llm, embeddings, index_name, category = None, limit = None,):
if not ANTHROPIC_AVAILABLE:
return {"error": "langchain-anthropic no instalado. pip install langchain-anthropic"}
if not ANTHROPIC_API_KEY:
return {"error": "ANTHROPIC_API_KEY no configurada en .env"}
if not GOLDEN_DATASET_PATH.exists():
return {"error": f"Golden dataset no encontrado en {GOLDEN_DATASET_PATH}"}
questions = json.loads(GOLDEN_DATASET_PATH.read_text(encoding="utf-8"))
if category:
questions = [q for q in questions if q.get("category") == category]
if limit:
questions = questions[:limit]
if not questions:
return {"error": "NO QUESTIONS WITH THIS FILTERS"}
logger.info(f"[eval] makind: {len(questions)} questions, index={index_name}")
claude_judge = ChatAnthropic(
model=CLAUDE_MODEL,
api_key=ANTHROPIC_API_KEY,
temperature=0,
max_tokens=2048,
)
rows = {"question": [], "answer": [], "contexts": [], "ground_truth": []}
details = []
t_start = time.time()
for item in questions:
q_id = item["id"]
question = item["question"]
gt = item["ground_truth"]
logger.info(f"[eval] {q_id}: {question[:60]}")
contexts = retrieve_context(es_client, embeddings, question, index_name)
if not contexts:
logger.warning(f"[eval] No context for {q_id} — skipping")
continue
answer = generate_answer(llm, question, contexts)
if not answer:
logger.warning(f"[eval] No answers for {q_id} — skipping")
continue
rows["question"].append(question)
rows["answer"].append(answer)
rows["contexts"].append(contexts)
rows["ground_truth"].append(gt)
details.append({
"id": q_id,
"category": item.get("category", ""),
"question": question,
"answer_preview": answer[:300],
"n_chunks": len(contexts),
})
if not rows["question"]:
return {"error": "NO SAMPLES GENETARED"}
dataset = Dataset.from_dict(rows)
ragas_llm = LangchainLLMWrapper(claude_judge)
ragas_emb = LangchainEmbeddingsWrapper(embeddings)
metrics = [faithfulness, answer_relevancy, context_recall, context_precision]
for metric in metrics:
metric.llm = ragas_llm
if hasattr(metric, "embeddings"):
metric.embeddings = ragas_emb
logger.info("[eval] JUDGING BY CLAUDE...")
result = ragas_evaluate(dataset, metrics=metrics)
elapsed = time.time() - t_start
scores = {
"faithfulness": round(float(result.get("faithfulness", 0)), 4),
"answer_relevancy": round(float(result.get("answer_relevancy", 0)), 4),
"context_recall": round(float(result.get("context_recall", 0)), 4),
"context_precision": round(float(result.get("context_precision", 0)), 4),
}
valid_scores = [v for v in scores.values() if v > 0]
global_score = round(sum(valid_scores) / len(valid_scores), 4) if valid_scores else 0.0
verdict = (
"EXCELLENT" if global_score >= 0.8 else
"ACCEPTABLE" if global_score >= 0.6 else
"INSUFFICIENT"
)
logger.info(f"[eval] FINISHED — global={global_score} verdict={verdict} "
f"elapsed={elapsed:.0f}s")
return {
"status": "ok",
"questions_evaluated": len(rows["question"]),
"elapsed_seconds": round(elapsed, 1),
"judge_model": CLAUDE_MODEL,
"index": index_name,
"category_filter": category or "all",
"scores": scores,
"global_score": global_score,
"verdict": verdict,
"details": details,
}

View File

@ -1,60 +1,391 @@
# graph.py
import logging
from collections import defaultdict
from elasticsearch import Elasticsearch
from langchain_core.documents import Document
from langchain_core.messages import SystemMessage
from langchain_core.messages import AIMessage, SystemMessage, HumanMessage, BaseMessage
from langgraph.graph import END, StateGraph
from langgraph.graph.state import CompiledStateGraph
from prompts import GENERATE_PROMPT, REFORMULATE_PROMPT
from prompts import (
CLASSIFY_PROMPT_TEMPLATE,
CODE_GENERATION_PROMPT,
CONVERSATIONAL_PROMPT,
GENERATE_PROMPT,
REFORMULATE_PROMPT,
)
from state import AgentState
logger = logging.getLogger(__name__)
session_store: dict[str, list] = defaultdict(list)
def format_context(docs: list[Document]) -> str:
def format_context(docs):
chunks = []
for i, doc in enumerate(docs, 1):
source = (doc.metadata or {}).get("source", "Untitled")
source_id = (doc.metadata or {}).get("id", f"chunk-{i}")
text = doc.page_content or ""
chunks.append(f"[{i}] id={source_id} source={source}\n{text}")
meta = doc.metadata or {}
chunk_id = meta.get("chunk_id", meta.get("id", f"chunk-{i}"))
source = meta.get("source_file", meta.get("source", "unknown"))
doc_type = meta.get("doc_type", "")
block_type = meta.get("block_type", "")
section = meta.get("section", "")
text = (doc.page_content or "").strip()
if not text:
text = meta.get("content") or meta.get("text") or ""
header_parts = [f"[{i}]", f"id={chunk_id}"]
if doc_type: header_parts.append(f"type={doc_type}")
if block_type: header_parts.append(f"block={block_type}")
if section: header_parts.append(f"section={section}")
header_parts.append(f"source={source}")
if doc_type in ("code", "code_example", "bnf") or \
block_type in ("function", "if", "startLoop", "try"):
header_parts.append("[AVAP CODE]")
chunks.append(" ".join(header_parts) + "\n" + text)
return "\n\n".join(chunks)
def build_graph(llm, vector_store) -> CompiledStateGraph:
def format_history_for_classify(messages):
lines = []
for msg in messages[-6:]:
if isinstance(msg, HumanMessage):
lines.append(f"User: {msg.content}")
elif isinstance(msg, AIMessage):
lines.append(f"Assistant: {msg.content[:300]}")
elif isinstance(msg, dict):
role = msg.get("role", "user")
content = msg.get("content", "")[:300]
lines.append(f"{role.capitalize()}: {content}")
return "\n".join(lines) if lines else "(no history)"
def hybrid_search_native(es_client, embeddings, query, index_name, k=8):
query_vector = None
try:
query_vector = embeddings.embed_query(query)
except Exception as e:
logger.warning(f"[hybrid] embed_query fails: {e}")
bm25_hits = []
try:
resp = es_client.search(
index=index_name,
body={
"size": k,
"query": {
"multi_match": {
"query": query,
"fields": ["content^2", "text^2"],
"type": "best_fields",
"fuzziness": "AUTO",
}
},
"_source": {"excludes": ["embedding"]},
}
)
bm25_hits = resp["hits"]["hits"]
logger.info(f"[hybrid] BM25 -> {len(bm25_hits)} hits")
except Exception as e:
logger.warning(f"[hybrid] BM25 fails: {e}")
knn_hits = []
if query_vector:
try:
resp = es_client.search(
index=index_name,
body={
"size": k,
"knn": {
"field": "embedding",
"query_vector": query_vector,
"k": k,
"num_candidates": k * 5,
},
"_source": {"excludes": ["embedding"]},
}
)
knn_hits = resp["hits"]["hits"]
logger.info(f"[hybrid] kNN -> {len(knn_hits)} hits")
except Exception as e:
logger.warning(f"[hybrid] kNN fails: {e}")
rrf_scores: dict[str, float] = defaultdict(float)
hit_by_id: dict[str, dict] = {}
for rank, hit in enumerate(bm25_hits):
doc_id = hit["_id"]
rrf_scores[doc_id] += 1.0 / (rank + 60)
hit_by_id[doc_id] = hit
for rank, hit in enumerate(knn_hits):
doc_id = hit["_id"]
rrf_scores[doc_id] += 1.0 / (rank + 60)
if doc_id not in hit_by_id:
hit_by_id[doc_id] = hit
ranked = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)[:k]
docs = []
for doc_id, score in ranked:
src = hit_by_id[doc_id]["_source"]
text = src.get("content") or src.get("text") or ""
meta = {k: v for k, v in src.items()
if k not in ("content", "text", "embedding")}
meta["id"]= doc_id
meta["rrf_score"] = score
docs.append(Document(page_content=text, metadata=meta))
logger.info(f"[hybrid] RRF -> {len(docs)} final docs")
return docs
def build_graph(llm, embeddings, es_client, index_name):
def _persist(state: AgentState, response: BaseMessage):
session_id = state.get("session_id", "")
if session_id:
session_store[session_id] = list(state["messages"]) + [response]
def classify(state):
messages = state["messages"]
user_msg = messages[-1]
question = getattr(user_msg, "content",
user_msg.get("content", "")
if isinstance(user_msg, dict) else "")
history_msgs = messages[:-1]
if not history_msgs:
prompt_content = (
CLASSIFY_PROMPT_TEMPLATE
.replace("{history}", "(no history)")
.replace("{message}", question)
)
resp = llm.invoke([SystemMessage(content=prompt_content)])
raw = resp.content.strip().upper()
query_type = _parse_query_type(raw)
logger.info(f"[classify] no historic content raw='{raw}' -> {query_type}")
return {"query_type": query_type}
history_text = format_history_for_classify(history_msgs)
prompt_content = (
CLASSIFY_PROMPT_TEMPLATE
.replace("{history}", history_text)
.replace("{message}", question)
)
resp = llm.invoke([SystemMessage(content=prompt_content)])
raw = resp.content.strip().upper()
query_type = _parse_query_type(raw)
logger.info(f"[classify] raw='{raw}' -> {query_type}")
return {"query_type": query_type}
def _parse_query_type(raw: str) -> str:
if raw.startswith("CODE_GENERATION") or "CODE" in raw:
return "CODE_GENERATION"
if raw.startswith("CONVERSATIONAL"):
return "CONVERSATIONAL"
return "RETRIEVAL"
def reformulate(state: AgentState) -> AgentState:
user_msg = state["messages"][-1]
resp = llm.invoke([REFORMULATE_PROMPT, user_msg])
reformulated = resp.content.strip()
logger.info(f"[reformulate] '{user_msg.content}''{reformulated}'")
logger.info(f"[reformulate] -> '{reformulated}'")
return {"reformulated_query": reformulated}
def retrieve(state: AgentState) -> AgentState:
query = state["reformulated_query"]
docs = vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k": 3},
).invoke(query)
docs = hybrid_search_native(
es_client=es_client,
embeddings=embeddings,
query=query,
index_name=index_name,
k=8,
)
context = format_context(docs)
logger.info(f"[retrieve] {len(docs)} docs fetched")
logger.info(context)
logger.info(f"[retrieve] {len(docs)} docs, context len={len(context)}")
return {"context": context}
def generate(state: AgentState) -> AgentState:
def generate(state):
prompt = SystemMessage(
content=GENERATE_PROMPT.content.format(context=state["context"])
)
resp = llm.invoke([prompt] + state["messages"])
logger.info(f"[generate] {len(resp.content)} chars")
_persist(state, resp)
return {"messages": [resp]}
def generate_code(state):
prompt = SystemMessage(
content=CODE_GENERATION_PROMPT.content.format(context=state["context"])
)
resp = llm.invoke([prompt] + state["messages"])
logger.info(f"[generate_code] {len(resp.content)} chars")
_persist(state, resp)
return {"messages": [resp]}
def respond_conversational(state):
resp = llm.invoke([CONVERSATIONAL_PROMPT] + state["messages"])
logger.info("[conversational] from comversation")
_persist(state, resp)
return {"messages": [resp]}
def route_by_type(state):
return state.get("query_type", "RETRIEVAL")
def route_after_retrieve(state):
qt = state.get("query_type", "RETRIEVAL")
return "generate_code" if qt == "CODE_GENERATION" else "generate"
graph_builder = StateGraph(AgentState)
graph_builder.add_node("classify", classify)
graph_builder.add_node("reformulate", reformulate)
graph_builder.add_node("retrieve", retrieve)
graph_builder.add_node("generate", generate)
graph_builder.add_node("generate_code", generate_code)
graph_builder.add_node("respond_conversational", respond_conversational)
graph_builder.set_entry_point("classify")
graph_builder.add_conditional_edges(
"classify",
route_by_type,
{
"RETRIEVAL": "reformulate",
"CODE_GENERATION": "reformulate",
"CONVERSATIONAL": "respond_conversational",
}
)
graph_builder.set_entry_point("reformulate")
graph_builder.add_edge("reformulate", "retrieve")
graph_builder.add_edge("retrieve", "generate")
graph_builder.add_conditional_edges(
"retrieve",
route_after_retrieve,
{
"generate": "generate",
"generate_code": "generate_code",
}
)
graph_builder.add_edge("generate", END)
graph_builder.add_edge("generate_code", END)
graph_builder.add_edge("respond_conversational", END)
return graph_builder.compile()
def build_prepare_graph(llm, embeddings, es_client, index_name):
def classify(state):
messages = state["messages"]
user_msg = messages[-1]
question = getattr(user_msg, "content",
user_msg.get("content", "")
if isinstance(user_msg, dict) else "")
history_msgs = messages[:-1]
if not history_msgs:
prompt_content = (
CLASSIFY_PROMPT_TEMPLATE
.replace("{history}", "(no history)")
.replace("{message}", question)
)
resp = llm.invoke([SystemMessage(content=prompt_content)])
raw = resp.content.strip().upper()
query_type = _parse_query_type(raw)
logger.info(f"[prepare/classify] no history raw='{raw}' -> {query_type}")
return {"query_type": query_type}
history_text = format_history_for_classify(history_msgs)
prompt_content = (
CLASSIFY_PROMPT_TEMPLATE
.replace("{history}", history_text)
.replace("{message}", question)
)
resp = llm.invoke([SystemMessage(content=prompt_content)])
raw = resp.content.strip().upper()
query_type = _parse_query_type(raw)
logger.info(f"[prepare/classify] raw='{raw}' -> {query_type}")
return {"query_type": query_type}
def _parse_query_type(raw: str) -> str:
if raw.startswith("CODE_GENERATION") or "CODE" in raw:
return "CODE_GENERATION"
if raw.startswith("CONVERSATIONAL"):
return "CONVERSATIONAL"
return "RETRIEVAL"
def reformulate(state: AgentState) -> AgentState:
user_msg = state["messages"][-1]
resp = llm.invoke([REFORMULATE_PROMPT, user_msg])
reformulated = resp.content.strip()
logger.info(f"[prepare/reformulate] -> '{reformulated}'")
return {"reformulated_query": reformulated}
def retrieve(state: AgentState) -> AgentState:
query = state["reformulated_query"]
docs = hybrid_search_native(
es_client=es_client,
embeddings=embeddings,
query=query,
index_name=index_name,
k=8,
)
context = format_context(docs)
logger.info(f"[prepare/retrieve] {len(docs)} docs, context len={len(context)}")
return {"context": context}
def skip_retrieve(state: AgentState) -> AgentState:
return {"context": ""}
def route_by_type(state):
return state.get("query_type", "RETRIEVAL")
graph_builder = StateGraph(AgentState)
graph_builder.add_node("classify", classify)
graph_builder.add_node("reformulate", reformulate)
graph_builder.add_node("retrieve", retrieve)
graph_builder.add_node("skip_retrieve", skip_retrieve)
graph_builder.set_entry_point("classify")
graph_builder.add_conditional_edges(
"classify",
route_by_type,
{
"RETRIEVAL": "reformulate",
"CODE_GENERATION": "reformulate",
"CONVERSATIONAL": "skip_retrieve",
}
)
graph_builder.add_edge("reformulate", "retrieve")
graph_builder.add_edge("retrieve", END)
graph_builder.add_edge("skip_retrieve",END)
return graph_builder.compile()
def build_final_messages(state: AgentState) -> list:
query_type = state.get("query_type", "RETRIEVAL")
context = state.get("context", "")
messages = state.get("messages", [])
if query_type == "CONVERSATIONAL":
return [CONVERSATIONAL_PROMPT] + messages
if query_type == "CODE_GENERATION":
prompt = SystemMessage(
content=CODE_GENERATION_PROMPT.content.format(context=context)
)
else:
prompt = SystemMessage(
content=GENERATE_PROMPT.content.format(context=context)
)
return [prompt] + messages

420
Docker/src/openai_proxy.py Normal file
View File

@ -0,0 +1,420 @@
import json
import os
import time
import uuid
import logging
import asyncio
import concurrent.futures
from typing import AsyncIterator, Optional, Any, Literal, Union
import grpc
import brunix_pb2
import brunix_pb2_grpc
from fastapi import FastAPI, HTTPException
from fastapi.responses import JSONResponse, StreamingResponse
from pydantic import BaseModel
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("openai-proxy")
_thread_pool = concurrent.futures.ThreadPoolExecutor(
max_workers=int(os.getenv("PROXY_THREAD_WORKERS", "20"))
)
GRPC_TARGET = os.getenv("BRUNIX_GRPC_TARGET", "localhost:50051")
PROXY_MODEL = os.getenv("PROXY_MODEL_ID", "brunix")
_channel: Optional[grpc.Channel] = None
_stub: Optional[brunix_pb2_grpc.AssistanceEngineStub] = None
def get_stub() -> brunix_pb2_grpc.AssistanceEngineStub:
global _channel, _stub
if _stub is None:
_channel = grpc.insecure_channel(GRPC_TARGET)
_stub = brunix_pb2_grpc.AssistanceEngineStub(_channel)
logger.info(f"[gRPC] connected to {GRPC_TARGET}")
return _stub
app = FastAPI(
title="Brunix OpenAI-Compatible Proxy",
version="2.0.0",
description="stream:false → AskAgent | stream:true → AskAgentStream",
)
class ChatMessage(BaseModel):
role: Literal["system", "user", "assistant", "function"] = "user"
content: str = ""
name: Optional[str] = None
class ChatCompletionRequest(BaseModel):
model: str = PROXY_MODEL
messages: list[ChatMessage]
stream: bool = False
temperature: Optional[float] = None
max_tokens: Optional[int] = None
session_id: Optional[str] = None # extensión Brunix
top_p: Optional[float] = None
n: Optional[int] = 1
stop: Optional[Any] = None
presence_penalty: Optional[float] = None
frequency_penalty: Optional[float] = None
user: Optional[str] = None
class CompletionRequest(BaseModel):
model: str = PROXY_MODEL
prompt: Union[str, list[str]] = ""
stream: bool = False
temperature: Optional[float] = None
max_tokens: Optional[int] = None
session_id: Optional[str] = None
suffix: Optional[str] = None
top_p: Optional[float] = None
n: Optional[int] = 1
stop: Optional[Any] = None
user: Optional[str] = None
# Ollama schemas
class OllamaChatMessage(BaseModel):
role: str = "user"
content: str = ""
class OllamaChatRequest(BaseModel):
model: str = PROXY_MODEL
messages: list[OllamaChatMessage]
stream: bool = True # Ollama hace stream por defecto
session_id: Optional[str] = None
class OllamaGenerateRequest(BaseModel):
model: str = PROXY_MODEL
prompt: str = ""
stream: bool = True
session_id: Optional[str] = None
def _ts() -> int:
return int(time.time())
def _chat_response(content: str, req_id: str) -> dict:
return {
"id": req_id, "object": "chat.completion", "created": _ts(),
"model": PROXY_MODEL,
"choices": [{"index": 0, "message": {"role": "assistant", "content": content}, "finish_reason": "stop"}],
"usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
}
def _completion_response(text: str, req_id: str) -> dict:
return {
"id": req_id, "object": "text_completion", "created": _ts(),
"model": PROXY_MODEL,
"choices": [{"text": text, "index": 0, "logprobs": None, "finish_reason": "stop"}],
"usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
}
def _chat_chunk(delta: str, req_id: str, finish: Optional[str] = None) -> dict:
return {
"id": req_id, "object": "chat.completion.chunk", "created": _ts(),
"model": PROXY_MODEL,
"choices": [{"index": 0,
"delta": {"role": "assistant", "content": delta} if delta else {},
"finish_reason": finish}],
}
def _completion_chunk(text: str, req_id: str, finish: Optional[str] = None) -> dict:
return {
"id": req_id, "object": "text_completion", "created": _ts(),
"model": PROXY_MODEL,
"choices": [{"text": text, "index": 0, "logprobs": None, "finish_reason": finish}],
}
def _sse(data: dict) -> str:
return f"data: {json.dumps(data)}\n\n"
def _sse_done() -> str:
return "data: [DONE]\n\n"
def _query_from_messages(messages: list[ChatMessage]) -> str:
for m in reversed(messages):
if m.role == "user":
return m.content
return ""
async def _invoke_blocking(query: str, session_id: str) -> str:
loop = asyncio.get_event_loop()
def _call():
stub = get_stub()
req = brunix_pb2.AgentRequest(query=query, session_id=session_id)
parts = []
for resp in stub.AskAgent(req):
if resp.text:
parts.append(resp.text)
return "".join(parts)
return await loop.run_in_executor(_thread_pool, _call)
async def _iter_stream(query: str, session_id: str) -> AsyncIterator[brunix_pb2.AgentResponse]:
loop = asyncio.get_event_loop()
queue: asyncio.Queue = asyncio.Queue()
def _producer():
try:
stub = get_stub()
req = brunix_pb2.AgentRequest(query=query, session_id=session_id)
for resp in stub.AskAgentStream(req): # ← AskAgentStream
asyncio.run_coroutine_threadsafe(queue.put(resp), loop).result()
except Exception as e:
asyncio.run_coroutine_threadsafe(queue.put(e), loop).result()
finally:
asyncio.run_coroutine_threadsafe(queue.put(None), loop).result() # sentinel
_thread_pool.submit(_producer)
while True:
item = await queue.get()
if item is None:
break
if isinstance(item, Exception):
raise item
yield item
async def _stream_chat(query: str, session_id: str, req_id: str) -> AsyncIterator[str]:
try:
async for resp in _iter_stream(query, session_id):
if resp.is_final:
yield _sse(_chat_chunk("", req_id, finish="stop"))
break
if resp.text:
yield _sse(_chat_chunk(resp.text, req_id))
except Exception as e:
logger.error(f"[stream_chat] error: {e}")
yield _sse(_chat_chunk(f"[Error: {e}]", req_id, finish="stop"))
yield _sse_done()
async def _stream_completion(query: str, session_id: str, req_id: str) -> AsyncIterator[str]:
try:
async for resp in _iter_stream(query, session_id):
if resp.is_final:
yield _sse(_completion_chunk("", req_id, finish="stop"))
break
if resp.text:
yield _sse(_completion_chunk(resp.text, req_id))
except Exception as e:
logger.error(f"[stream_completion] error: {e}")
yield _sse(_completion_chunk(f"[Error: {e}]", req_id, finish="stop"))
yield _sse_done()
def _ollama_chat_chunk(token: str, done: bool) -> str:
return json.dumps({
"model": PROXY_MODEL,
"created_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"message": {"role": "assistant", "content": token},
"done": done,
}) + "\n"
def _ollama_generate_chunk(token: str, done: bool) -> str:
return json.dumps({
"model": PROXY_MODEL,
"created_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"response": token,
"done": done,
}) + "\n"
async def _stream_ollama_chat(query: str, session_id: str) -> AsyncIterator[str]:
try:
async for resp in _iter_stream(query, session_id):
if resp.is_final:
yield _ollama_chat_chunk("", done=True)
break
if resp.text:
yield _ollama_chat_chunk(resp.text, done=False)
except Exception as e:
logger.error(f"[ollama_chat] error: {e}")
yield _ollama_chat_chunk(f"[Error: {e}]", done=True)
async def _stream_ollama_generate(query: str, session_id: str) -> AsyncIterator[str]:
try:
async for resp in _iter_stream(query, session_id):
if resp.is_final:
yield _ollama_generate_chunk("", done=True)
break
if resp.text:
yield _ollama_generate_chunk(resp.text, done=False)
except Exception as e:
logger.error(f"[ollama_generate] error: {e}")
yield _ollama_generate_chunk(f"[Error: {e}]", done=True)
@app.get("/v1/models")
async def list_models():
return {
"object": "list",
"data": [{
"id": PROXY_MODEL, "object": "model", "created": 1700000000,
"owned_by": "brunix", "permission": [], "root": PROXY_MODEL, "parent": None,
}],
}
@app.post("/v1/chat/completions")
async def chat_completions(req: ChatCompletionRequest):
query = _query_from_messages(req.messages)
session_id = req.session_id or req.user or "default"
req_id = f"chatcmpl-{uuid.uuid4().hex}"
logger.info(f"[chat] session={session_id} stream={req.stream} query='{query[:80]}'")
if not query:
raise HTTPException(status_code=400, detail="No user message found in messages.")
if req.stream:
return StreamingResponse(
_stream_chat(query, session_id, req_id),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)
try:
text = await _invoke_blocking(query, session_id)
except grpc.RpcError as e:
raise HTTPException(status_code=502, detail=f"gRPC error: {e.details()}")
return JSONResponse(_chat_response(text, req_id))
@app.post("/v1/completions")
async def completions(req: CompletionRequest):
query = req.prompt if isinstance(req.prompt, str) else " ".join(req.prompt)
session_id = req.session_id or req.user or "default"
req_id = f"cmpl-{uuid.uuid4().hex}"
logger.info(f"[completion] session={session_id} stream={req.stream} prompt='{query[:80]}'")
if not query:
raise HTTPException(status_code=400, detail="prompt is required.")
if req.stream:
return StreamingResponse(
_stream_completion(query, session_id, req_id),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)
try:
text = await _invoke_blocking(query, session_id)
except grpc.RpcError as e:
raise HTTPException(status_code=502, detail=f"gRPC error: {e.details()}")
return JSONResponse(_completion_response(text, req_id))
@app.get("/health")
async def health():
return {"status": "ok", "grpc_target": GRPC_TARGET}
@app.get("/api/tags")
async def ollama_tags():
return {
"models": [{
"name": PROXY_MODEL,
"model":PROXY_MODEL,
"modified_at": "2024-01-01T00:00:00Z",
"size": 0,
"digest":"brunix",
"details": {
"format": "gguf",
"family": "brunix",
"parameter_size": "unknown",
"quantization_level": "unknown",
},
}]
}
@app.post("/api/chat")
async def ollama_chat(req: OllamaChatRequest):
query = next((m.content for m in reversed(req.messages) if m.role == "user"), "")
session_id = req.session_id or "default"
logger.info(f"[ollama/chat] session={session_id} stream={req.stream} query='{query[:80]}'")
if not query:
raise HTTPException(status_code=400, detail="No user message found.")
if req.stream:
return StreamingResponse(
_stream_ollama_chat(query, session_id),
media_type="application/x-ndjson",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)
try:
text = await _invoke_blocking(query, session_id)
except grpc.RpcError as e:
raise HTTPException(status_code=502, detail=f"gRPC error: {e.details()}")
return JSONResponse({
"model": PROXY_MODEL,
"created_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"message": {"role": "assistant", "content": text},
"done": True,
})
@app.post("/api/generate")
async def ollama_generate(req: OllamaGenerateRequest):
session_id = req.session_id or "default"
logger.info(f"[ollama/generate] session={session_id} stream={req.stream} prompt='{req.prompt[:80]}'")
if not req.prompt:
raise HTTPException(status_code=400, detail="prompt is required.")
if req.stream:
return StreamingResponse(
_stream_ollama_generate(req.prompt, session_id),
media_type="application/x-ndjson",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)
try:
text = await _invoke_blocking(req.prompt, session_id)
except grpc.RpcError as e:
raise HTTPException(status_code=502, detail=f"gRPC error: {e.details()}")
return JSONResponse({
"model": PROXY_MODEL,
"created_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"response": text,
"done": True,
})

View File

@ -1,89 +1,250 @@
from langchain_core.messages import SystemMessage
CLASSIFY_PROMPT_TEMPLATE = (
"<role>\n"
"You are a query classifier for an AVAP language assistant. "
"Your only job is to classify the user message into one of three categories.\n"
"</role>\n\n"
"<categories>\n"
"RETRIEVAL — the user is asking about AVAP concepts, documentation, syntax rules, "
"or how something works. They want an explanation, not code.\n"
"Examples: 'What is addVar?', 'How does registerEndpoint work?', "
"'What is the difference between if() modes?'\n\n"
"CODE_GENERATION — the user is asking to generate, write, create, build, or show "
"an example of an AVAP script, function, API, or code snippet. "
"They want working code as output.\n"
"Examples: 'Write an API that returns hello world', "
"'Generate a function that queries the DB', "
"'Show me how to create an endpoint', "
"'dame un ejemplo de codigo', 'escribeme un script', "
"'dime como seria un API', 'genera un API', 'como haria'\n\n"
"CONVERSATIONAL — the user is following up on the previous answer. "
"They want a reformulation, summary, or elaboration of what was already said.\n"
"Examples: 'can you explain that?', 'en menos palabras', "
"'describe it in your own words', 'what did you mean?'\n"
"</categories>\n\n"
"<output_rule>\n"
"Your entire response must be exactly one word: "
"RETRIEVAL, CODE_GENERATION, or CONVERSATIONAL. Nothing else.\n"
"</output_rule>\n\n"
"<conversation_history>\n"
"{history}\n"
"</conversation_history>\n\n"
"<user_message>{message}</user_message>"
)
REFORMULATE_PROMPT = SystemMessage(
content=(
"You are a deterministic lexical query rewriter used for vector retrieval.\n"
"Your task is to rewrite user questions into optimized keyword search queries.\n\n"
"<role>\n"
"You are a deterministic query rewriter whose sole purpose is to prepare "
"user questions for vector similarity retrieval against an AVAP language "
"knowledge base. You do not answer questions. You only transform phrasing "
"into keyword queries that will find the right AVAP documentation chunks.\n"
"</role>\n\n"
"CRITICAL RULES (ABSOLUTE):\n"
"1. NEVER answer the question.\n"
"2. NEVER expand acronyms.\n"
"3. NEVER introduce new terms not present in the original query.\n"
"4. NEVER infer missing information.\n"
"5. NEVER add explanations, definitions, or interpretations.\n"
"6. Preserve all technical tokens exactly as written.\n"
"7. Only remove filler words (e.g., what, does, is, explain, tell me, please).\n"
"8. You may reorder terms for better retrieval.\n"
"9. Output must be a single-line plain keyword query.\n"
"10. If the query is already optimal, return it unchanged.\n\n"
"11. If you receive something that looks like code, do NOT attempt to rewrite it. Return it verbatim.\n\n"
"<task>\n"
"Rewrite the user message into a compact keyword query for semantic search.\n\n"
"ALLOWED OPERATIONS:\n"
"- Remove interrogative phrasing.\n"
"- Remove stopwords.\n"
"- Reorder words.\n"
"- Convert to noun phrase form.\n\n"
"SPECIAL RULE for code generation requests:\n"
"When the user asks to generate/create/build/show AVAP code, expand the query "
"with the AVAP commands typically needed. Use this mapping:\n\n"
"FORBIDDEN OPERATIONS:\n"
"- Expanding abbreviations.\n"
"- Paraphrasing into unseen vocabulary.\n"
"- Adding definitions.\n"
"- Answering implicitly.\n\n"
"- API / endpoint / route / HTTP response\n"
" expand to: AVAP registerEndpoint addResult _status\n\n"
"Examples:\n"
"Input: What does AVAP stand for?\n"
"Output: AVAP stand for\n"
"- Read input / parameter\n"
" expand to: AVAP addParam getQueryParamList\n\n"
"Input: Hey, I'm trying to understand how AVAP handels a ZeroDivisionError when doing divison or modulus operatoins. Can you explane what situatoins cause a ZeroDivisionError to be raised and how I can catch it in my AVAP scripts?\n"
"Output: AVAP ZeroDivisionError division / modulus % catch try except\n"
"Input: What does AVAP stand for?\n"
"Output: AVAP stand for\n"
"- Database / ORM / query\n"
" expand to: AVAP ormAccessSelect ormAccessInsert avapConnector\n\n"
"Input: Please explain how the import statement works in AVAP scripts.\n"
"Output: AVAP import statement syntax behavior\n\n"
"- Error handling\n"
" expand to: AVAP try exception end\n\n"
"Return only the rewritten query."
"- Loop / iterate\n"
" expand to: AVAP startLoop endLoop itemFromList getListLen\n\n"
"- HTTP request / call external\n"
" expand to: AVAP RequestPost RequestGet\n"
"</task>\n\n"
"<rules>\n"
"- Preserve all AVAP identifiers verbatim.\n"
"- Remove filler words.\n"
"- Output a single line.\n"
"- Never answer the question.\n"
"</rules>\n\n"
"<examples>\n"
"<example>\n"
"<input>What does AVAP stand for?</input>\n"
"<o>AVAP stand for</o>\n"
"</example>\n\n"
"<example>\n"
"<input>dime como seria un API que devuelva hello world con AVAP</input>\n"
"<o>AVAP registerEndpoint addResult _status hello world example</o>\n"
"</example>\n\n"
"<example>\n"
"<input>generate an AVAP script that reads a parameter and queries the DB</input>\n"
"<o>AVAP addParam ormAccessSelect avapConnector registerEndpoint addResult</o>\n"
"</example>\n"
"</examples>\n\n"
"Return only the rewritten query. No labels, no prefixes, no explanation."
)
)
CONFIDENCE_PROMPT_TEMPLATE = (
"<role>\n"
"You are a relevance evaluator. Decide whether the context contains "
"useful information to address the user question.\n"
"</role>\n\n"
"<task>\n"
"Answer YES if the context contains at least one relevant passage. "
"Answer NO only if context is empty or completely unrelated.\n"
"</task>\n\n"
"<output_rule>\n"
"Exactly one word: YES or NO.\n"
"</output_rule>\n\n"
"<question>{question}</question>\n\n"
"<context>{context}</context>"
)
CODE_GENERATION_PROMPT = SystemMessage(
content=(
"<role>\n"
"You are an expert AVAP programmer. AVAP (Advanced Virtual API Programming) "
"is a domain-specific language for orchestrating microservices and HTTP I/O. "
"Write correct, minimal, working AVAP code.\n"
"</role>\n\n"
"<critical_rules>\n"
"1. AVAP is line-oriented: every statement on a single line.\n"
"2. Use ONLY commands from <avap_syntax_reminder> or explicitly described in <context>.\n"
"3. Do NOT copy code examples from <context> that solve a DIFFERENT problem. "
"Context examples are syntax references only — ignore them if unrelated.\n"
"4. Write the MINIMUM code needed. No extra connectors, no unrelated variables.\n"
"5. Add brief inline comments explaining each part.\n"
"6. Answer in the same language the user used.\n"
"</critical_rules>\n\n"
"<avap_syntax_reminder>\n"
"// Register an HTTP endpoint\n"
"registerEndpoint(\"GET\", \"/path\", [], \"scope\", handlerFn, \"\")\n\n"
"// Declare a function — uses curly braces, NOT end()\n"
"function handlerFn() {{\n"
" msg = \"Hello World\"\n"
" addResult(msg)\n"
"}}\n\n"
"// Assign a value to a variable\n"
"addVar(varName, \"value\") // or: varName = \"value\"\n\n"
"// Add variable to HTTP JSON response body\n"
"addResult(varName)\n\n"
"// Set HTTP response status code\n"
"_status = 200 // or: addVar(_status, 200)\n\n"
"// Read a request parameter (URL, body, or form)\n"
"addParam(\"paramName\", targetVar)\n\n"
"// Conditional\n"
"if(var, value, \"==\")\n"
" // ...\n"
"end()\n\n"
"// Loop\n"
"startLoop(i, 0, length)\n"
" // ...\n"
"endLoop()\n\n"
"// Error handling\n"
"try()\n"
" // ...\n"
"exception(errVar)\n"
" // handle\n"
"end()\n"
"</avap_syntax_reminder>\n\n"
"<task>\n"
"Generate a minimal, complete AVAP example for the user's request.\n\n"
"Structure:\n"
"1. One sentence describing what the code does.\n"
"2. The AVAP code block — clean, minimal, with inline comments.\n"
"3. Two or three lines explaining the key commands used.\n"
"</task>\n\n"
"<context>\n"
"{context}\n"
"</context>"
)
)
CONVERSATIONAL_PROMPT = SystemMessage(
content=(
"<role>\n"
"You are a helpful AVAP assistant continuing an ongoing conversation.\n"
"</role>\n\n"
"<task>\n"
"The user is following up on something already discussed. "
"Rephrase, summarize, or elaborate using the conversation history.\n"
"</task>\n\n"
"<rules>\n"
"- Base your answer on the conversation history.\n"
"- Do not introduce new AVAP facts not in the history.\n"
"- Keep the same language the user is using.\n"
"- No Answer/Evidence format. Just answer naturally.\n"
"</rules>"
)
)
GENERATE_PROMPT = SystemMessage(
content=(
"""You are a retrieval-grounded assistant.
"<role>\n"
"You are a precise, retrieval-grounded assistant specialized in AVAP. "
"Answers are honest, calibrated to evidence, and clearly structured.\n"
"</role>\n\n"
CRITICAL SCOPE RULES
- AVAP (Advanced Virtual API Programming) is a new/unknown programming language.
- If the user message is AVAP-related (explicitly mentions "AVAP" case-insensitive OR the provided context is AVAP code/docs), you MUST answer using ONLY <CONTEXT>.
- You MUST NOT use general programming knowledge to interpret AVAP.
- You MUST NOT assume AVAP behaves like any other language even if syntax looks similar.
- You MUST NOT infer missing details. Only state what is explicitly present in <CONTEXT>.
"<critical_constraint>\n"
"AVAP is a new proprietary language. Use ONLY content inside <context>. "
"Treat any AVAP knowledge outside <context> as unreliable.\n"
"</critical_constraint>\n\n"
GROUNDING REQUIREMENTS (AVAP-RELATED)
1) Every non-trivial factual claim MUST be directly supported by an EXACT QUOTE from <CONTEXT>.
2) If a claim is not supported by a quote, DO NOT include it.
3) If <CONTEXT> does not contain enough information to answer, reply with EXACTLY:
"I don't have enough information in the provided context to answer that."
"<task>\n"
"Answer using exclusively the information in <context>.\n"
"</task>\n\n"
WORKFLOW (AVAP-RELATED) FOLLOW IN ORDER
A) Identify the specific question(s) being asked.
B) Extract the minimum necessary quotes from <CONTEXT> that answer those question(s).
C) Write the answer using ONLY those quotes (paraphrase is allowed, but every statement must be backed by at least one quote).
D) Verify: for EACH sentence in your answer, confirm there is a supporting quote. If any sentence lacks a quote, delete it or refuse.
"<thinking_steps>\n"
"Step 1 — Find relevant passages in <context>.\n"
"Step 2 — Assess if question can be fully or partially answered.\n"
"Step 3 — Write a clear answer backed by those passages.\n"
"Step 4 — If context contains relevant AVAP code, include it exactly.\n"
"</thinking_steps>\n\n"
OUTPUT FORMAT (AVAP-RELATED ONLY)
Answer:
<short, direct answer; no extra speculation; no unrelated tips>
"<output_format>\n"
"Answer:\n"
"<direct answer; include code blocks if context has relevant code>\n\n"
Evidence:
- "<exact quote 1>"
- "<exact quote 2>"
(Include only quotes you actually used. Prefer the smallest quotes that fully support the statements.)
"Evidence:\n"
"- \"<exact quote from context>\"\n"
"(only quotes you actually used)\n\n"
NON-AVAP QUESTIONS
- If the question is clearly not AVAP-related, answer normally using general knowledge.
"If context has no relevant information reply with exactly:\n"
"\"I don't have enough information in the provided context to answer that.\"\n"
"</output_format>\n\n"
<CONTEXT>
{context}
</CONTEXT>"""
"<context>\n"
"{context}\n"
"</context>"
)
)

View File

@ -8,18 +8,28 @@ import brunix_pb2
import brunix_pb2_grpc
import grpc
from grpc_reflection.v1alpha import reflection
from langchain_elasticsearch import ElasticsearchStore
from elasticsearch import Elasticsearch
from langchain_core.messages import AIMessage
from utils.llm_factory import create_chat_model
from utils.emb_factory import create_embedding_model
from graph import build_graph
from graph import build_graph, build_prepare_graph, build_final_messages, session_store
from evaluate import run_evaluation
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("brunix-engine")
class BrunixEngine(brunix_pb2_grpc.AssistanceEngineServicer):
def __init__(self):
es_url = os.getenv("ELASTICSEARCH_URL", "http://localhost:9200")
es_user = os.getenv("ELASTICSEARCH_USER")
es_pass = os.getenv("ELASTICSEARCH_PASSWORD")
es_apikey = os.getenv("ELASTICSEARCH_API_KEY")
index = os.getenv("ELASTICSEARCH_INDEX", "avap-knowledge-v1")
self.llm = create_chat_model(
provider="ollama",
model=os.getenv("OLLAMA_MODEL_NAME"),
@ -27,56 +37,194 @@ class BrunixEngine(brunix_pb2_grpc.AssistanceEngineServicer):
temperature=0,
validate_model_on_init=True,
)
self.embeddings = create_embedding_model(
provider="ollama",
model=os.getenv("OLLAMA_EMB_MODEL_NAME"),
base_url=os.getenv("OLLAMA_URL"),
)
self.vector_store = ElasticsearchStore(
es_url=os.getenv("ELASTICSEARCH_URL"),
index_name=os.getenv("ELASTICSEARCH_INDEX"),
embedding=self.embeddings,
query_field="text",
vector_query_field="embedding",
)
es_kwargs: dict = {"hosts": [es_url], "request_timeout": 60}
if es_apikey:
es_kwargs["api_key"] = es_apikey
elif es_user and es_pass:
es_kwargs["basic_auth"] = (es_user, es_pass)
self.es_client = Elasticsearch(**es_kwargs)
self.index_name = index
if self.es_client.ping():
info = self.es_client.info()
logger.info(f"[ESEARCH] Connected: {info['version']['number']} — index: {index}")
else:
logger.error("[ESEARCH] Cant Connect")
self.graph = build_graph(
llm=self.llm,
vector_store=self.vector_store
llm = self.llm,
embeddings = self.embeddings,
es_client = self.es_client,
index_name = self.index_name,
)
logger.info("Brunix Engine initializing.")
self.prepare_graph = build_prepare_graph(
llm = self.llm,
embeddings = self.embeddings,
es_client = self.es_client,
index_name = self.index_name,
)
logger.info("Brunix Engine initialized.")
def AskAgent(self, request, context):
logger.info(f"request {request.session_id}): {request.query[:50]}.")
session_id = request.session_id or "default"
query = request.query
logger.info(f"[AskAgent] session={session_id} query='{query[:80]}'")
try:
final_state = self.graph.invoke({"messages": [{"role": "user",
"content": request.query}]})
history = list(session_store.get(session_id, []))
logger.info(f"[AskAgent] conversation: {len(history)} previous messages.")
initial_state = {
"messages": history + [{"role": "user", "content": query}],
"session_id": session_id,
"reformulated_query": "",
"context": "",
"query_type": "",
}
final_state = self.graph.invoke(initial_state)
messages = final_state.get("messages", [])
last_msg = messages[-1] if messages else None
result_text = getattr(last_msg, "content", str(last_msg)) if last_msg else ""
result_text = getattr(last_msg, "content", str(last_msg)) \
if last_msg else ""
logger.info(f"[AskAgent] query_type={final_state.get('query_type')} "
f"answer='{result_text[:100]}'")
yield brunix_pb2.AgentResponse(
text=result_text,
avap_code="AVAP-2026",
is_final=True,
text = result_text,
avap_code= "AVAP-2026",
is_final = True,
)
yield brunix_pb2.AgentResponse(text="", avap_code="", is_final=True)
except Exception as e:
logger.error(f"Error in AskAgent: {str(e)}", exc_info=True)
logger.error(f"[AskAgent] Error: {e}", exc_info=True)
yield brunix_pb2.AgentResponse(
text=f"[Error Motor]: {str(e)}",
is_final=True,
text = f"[ENG] Error: {str(e)}",
is_final = True,
)
def AskAgentStream(self, request, context):
session_id = request.session_id or "default"
query = request.query
logger.info(f"[AskAgentStream] session={session_id} query='{query[:80]}'")
try:
history = list(session_store.get(session_id, []))
logger.info(f"[AskAgentStream] conversation: {len(history)} previous messages.")
initial_state = {
"messages": history + [{"role": "user", "content": query}],
"session_id": session_id,
"reformulated_query": "",
"context": "",
"query_type": "",
}
prepared = self.prepare_graph.invoke(initial_state)
logger.info(
f"[AskAgentStream] query_type={prepared.get('query_type')} "
f"context_len={len(prepared.get('context', ''))}"
)
final_messages = build_final_messages(prepared)
full_response = []
for chunk in self.llm.stream(final_messages):
token = chunk.content
if token:
full_response.append(token)
yield brunix_pb2.AgentResponse(
text = token,
is_final = False,
)
complete_text = "".join(full_response)
if session_id:
session_store[session_id] = (
list(prepared["messages"]) + [AIMessage(content=complete_text)]
)
logger.info(
f"[AskAgentStream] done — "
f"chunks={len(full_response)} total_chars={len(complete_text)}"
)
yield brunix_pb2.AgentResponse(text="", is_final=True)
except Exception as e:
logger.error(f"[AskAgentStream] Error: {e}", exc_info=True)
yield brunix_pb2.AgentResponse(
text = f"[ENG] Error: {str(e)}",
is_final = True,
)
def EvaluateRAG(self, request, context):
category = request.category or None
limit = request.limit or None
index = request.index or self.index_name
logger.info(f"[EvaluateRAG] category={category} limit={limit} index={index}")
try:
result = run_evaluation(
es_client = self.es_client,
llm = self.llm,
embeddings = self.embeddings,
index_name = index,
category = category,
limit = limit,
)
except Exception as e:
logger.error(f"[EvaluateRAG] Error: {e}", exc_info=True)
return brunix_pb2.EvalResponse(status=f"error: {e}")
if result.get("status") != "ok":
return brunix_pb2.EvalResponse(status=result.get("error", "unknown error"))
details = [
brunix_pb2.QuestionDetail(
id = d["id"],
category = d["category"],
question = d["question"],
answer_preview = d["answer_preview"],
n_chunks = d["n_chunks"],
)
for d in result.get("details", [])
]
scores = result["scores"]
return brunix_pb2.EvalResponse(
status = "ok",
questions_evaluated = result["questions_evaluated"],
elapsed_seconds = result["elapsed_seconds"],
judge_model = result["judge_model"],
index = result["index"],
faithfulness = scores["faithfulness"],
answer_relevancy = scores["answer_relevancy"],
context_recall = scores["context_recall"],
context_precision = scores["context_precision"],
global_score = result["global_score"],
verdict= result["verdict"],
details= details,
)
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
brunix_pb2_grpc.add_AssistanceEngineServicer_to_server(BrunixEngine(), server)
SERVICE_NAMES = (
@ -86,7 +234,7 @@ def serve():
reflection.enable_server_reflection(SERVICE_NAMES, server)
server.add_insecure_port("[::]:50051")
logger.info("Brunix Engine on port 50051")
logger.info("[ENGINE] listen on 50051 (gRPC)")
server.start()
server.wait_for_termination()

View File

@ -1,9 +1,11 @@
# state.py
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[list, add_messages]
messages: Annotated[list, add_messages]
reformulated_query: str
context: str
context: str
query_type: str
session_id: str

436
README.md
View File

@ -42,37 +42,75 @@ graph TD
## Project Structure
```text
├── README.md # System documentation & Dev guide
├── README.md # Setup guide & dev reference (this file)
├── CONTRIBUTING.md # Contribution standards, GitFlow, PR process
├── SECURITY.md # Security policy and vulnerability reporting
├── changelog # Version tracking and release history
├── pyproject.toml
├── ingestion/ # Data ingested in Elasticsearch
├── docs/
│ ├── AVAP Language: ... # AVAP DSL Documentation
│ │ └── AVAP.md
│ ├── developer.avapfr... # Documents on developer web page
│ ├── LRM/ # AVAP LRM documentation
│ │ └── avap.md
│ └── samples/ # AVAP code samples
├── Docker/
├── pyproject.toml # Python project configuration (uv)
├── uv.lock # Locked dependency graph
├── Docker/ # Production container
│ ├── protos/
│ │ └── brunix.proto # Protocol Buffers: The source of truth for the API
│ │ └── brunix.proto # gRPC API contract (source of truth)
│ ├── src/
│ │ ├── graph.py # Workflow graph orchestration
│ │ ├── prompts.py # Centralized prompt definitions
│ │ ├── server.py # gRPC Server & RAG Orchestration
│ │ ├── state.py # Shared state management
│ │ └── utils/ # Utility modules
│ ├── Dockerfile # Container definition for the Engine
│ ├── docker-compose.yaml # Local orchestration for dev environment
│ ├── requirements.txt # Python dependencies for Docker
│ └── .dockerignore # Docker ignore files
│ │ ├── server.py # gRPC server — AskAgent, AskAgentStream, EvaluateRAG
│ │ ├── openai_proxy.py # OpenAI & Ollama-compatible HTTP proxy (port 8000)
│ │ ├── graph.py # LangGraph orchestration — build_graph, build_prepare_graph
│ │ ├── prompts.py # Centralized prompt definitions (CLASSIFY, GENERATE, etc.)
│ │ ├── state.py # AgentState TypedDict (shared across graph nodes)
│ │ ├── evaluate.py # RAGAS evaluation pipeline (Claude as judge)
│ │ ├── golden_dataset.json # Ground-truth Q&A dataset for EvaluateRAG
│ │ └── utils/
│ │ ├── emb_factory.py # Provider-agnostic embedding model factory
│ │ └── llm_factory.py # Provider-agnostic LLM factory
│ ├── Dockerfile # Multi-stage container build
│ ├── docker-compose.yaml # Local dev orchestration
│ ├── entrypoint.sh # Starts gRPC server + HTTP proxy in parallel
│ ├── requirements.txt # Pinned production dependencies (exported by uv)
│ ├── .env # Local secrets (never commit — see .gitignore)
│ └── .dockerignore # Excludes dev artifacts from image build context
├── docs/ # Knowledge base & project documentation
│ ├── ARCHITECTURE.md # Deep technical architecture reference
│ ├── API_REFERENCE.md # Complete gRPC & HTTP API contract with examples
│ ├── RUNBOOK.md # Operational playbooks and incident response
│ ├── AVAP_CHUNKER_CONFIG.md # avap_config.json reference — blocks, statements, semantic tags
│ ├── adr/ # Architecture Decision Records
│ │ ├── ADR-0001-grpc-primary-interface.md
│ │ ├── ADR-0002-two-phase-streaming.md
│ │ ├── ADR-0003-hybrid-retrieval-rrf.md
│ │ └── ADR-0004-claude-eval-judge.md
│ ├── avap_language_github_docs/ # AVAP language reference docs (GitHub source)
│ ├── developer.avapframework.com/ # AVAP developer portal docs
│ ├── LRM/
│ │ └── avap.md # AVAP Language Reference Manual (LRM)
│ └── samples/ # AVAP code samples (.avap) used for ingestion
├── ingestion/
│ └── chunks.json # Last export of ingested chunks (ES bulk output)
├── scripts/
│ └── pipelines/
│ ├── flows/ # Processing pipelines
│ └── tasks/ # Modules used by the flows
└── src/
├── config.py # Environment variables configuration file
│ │
│ ├── flows/ # Executable pipeline entry points (Typer CLI)
│ │ ├── elasticsearch_ingestion.py # [PIPELINE A] Chonkie-based ingestion flow
│ │ ├── generate_mbap.py # Synthetic MBPP-AVAP dataset generator (Claude)
│ │ └── translate_mbpp.py # MBPP→AVAP dataset translation pipeline
│ │
│ ├── tasks/ # Reusable task modules for Pipeline A
│ │ ├── chunk.py # Document fetching, Chonkie chunking & ES bulk write
│ │ ├── embeddings.py # OllamaEmbeddings adapter (Chonkie-compatible)
│ │ └── prompts.py # Prompt templates for pipeline LLM calls
│ │
│ └── ingestion/ # [PIPELINE B] AVAP-native classic ingestion
│ ├── avap_chunker.py # Custom AVAP lexer + chunker (MinHash dedup, overlaps)
│ ├── avap_ingestor.py # Async ES ingestor with DLQ (producer/consumer pattern)
│ ├── avap_config.json # AVAP language config (blocks, statements, semantic tags)
│ └── ingestion/
│ └── chunks.jsonl # JSONL output from avap_chunker.py
└── src/ # Shared library (used by both Docker and scripts)
├── config.py # Pydantic settings — reads all environment variables
└── utils/
├── emb_factory.py # Embedding model factory
└── llm_factory.py # LLM model factory
@ -114,6 +152,146 @@ sequenceDiagram
---
## Knowledge Base Ingestion
The Elasticsearch vector index is populated via one of two independent pipelines. Both pipelines require the Elasticsearch tunnel to be active (`localhost:9200`) and the Ollama embedding model (`OLLAMA_EMB_MODEL_NAME`) to be available.
### Pipeline A — Chonkie (recommended for markdown + .avap)
Uses the [Chonkie](https://github.com/chonkie-ai/chonkie) library for semantic chunking. Supports `.md` (via `MarkdownChef`) and `.avap` (via `TextChef` + `TokenChunker`). Chunks are embedded with Ollama and bulk-indexed into Elasticsearch via `ElasticHandshakeWithMetadata`.
**Entry point:** `scripts/pipelines/flows/elasticsearch_ingestion.py`
```bash
# Index all markdown and AVAP files from docs/LRM
python -m scripts.pipelines.flows.elasticsearch_ingestion \
--docs-folder-path docs/LRM \
--output ingestion/chunks.json \
--docs-extension .md .avap \
--es-index avap-docs-test \
--delete-es-index
# Index the AVAP code samples
python -m scripts.pipelines.flows.elasticsearch_ingestion \
--docs-folder-path docs/samples \
--output ingestion/chunks.json \
--docs-extension .avap \
--es-index avap-docs-test
```
**How it works:**
```
docs/**/*.md + docs/**/*.avap
▼ FileFetcher (Chonkie)
├─ .md → MarkdownChef → merge code blocks + tables into chunks
│ ↓
│ TokenChunker (HuggingFace tokenizer: HF_EMB_MODEL_NAME)
└─ .avap → TextChef → TokenChunker
▼ OllamaEmbeddings.embed_batch() (OLLAMA_EMB_MODEL_NAME)
▼ ElasticHandshakeWithMetadata.write()
bulk index → {text, embedding, file, start_index, end_index, token_count}
▼ export_documents() → ingestion/chunks.json
```
| Chunk field | Source |
|---|---|
| `text` | Raw chunk text |
| `embedding` | Ollama dense vector |
| `start_index` / `end_index` | Character offsets in source file |
| `token_count` | HuggingFace tokenizer count |
| `file` | Source filename |
---
### Pipeline B — AVAP Native (classic, for .avap files with full semantic analysis)
A custom lexer-based chunker purpose-built for the AVAP language using `avap_config.json` as its grammar definition. Produces richer metadata (block type, section, semantic tags, complexity score) and includes **MinHash LSH deduplication** and **semantic overlap** between chunks.
**Entry point:** `scripts/pipelines/ingestion/avap_chunker.py`
**Grammar config:** `scripts/pipelines/ingestion/avap_config.json` — see [`docs/AVAP_CHUNKER_CONFIG.md`](./docs/AVAP_CHUNKER_CONFIG.md) for the full reference on blocks, statements, semantic tags, and how to extend the grammar.
```bash
python scripts/pipelines/ingestion/avap_chunker.py \
--lang-config scripts/pipelines/ingestion/avap_config.json \
--docs-path docs/samples \
--output scripts/pipelines/ingestion/ingestion/chunks.jsonl \
--workers 4
```
**Step 2 — Ingest:** `scripts/pipelines/ingestion/avap_ingestor.py`
```bash
# Ingest from existing JSONL
python scripts/pipelines/ingestion/avap_ingestor.py \
--chunks scripts/pipelines/ingestion/ingestion/chunks.jsonl \
--index avap-knowledge-v1 \
--delete
# Check model embedding dimensions first
python scripts/pipelines/ingestion/avap_ingestor.py --probe-dim
```
**How it works:**
```
docs/**/*.avap + docs/**/*.md
▼ avap_chunker.py (GenericLexer + LanguageConfig)
│ ├─ .avap: block detection (function/if/startLoop/try), statement classification
│ │ semantic tags enrichment, function signature extraction
│ │ semantic overlap injection (OVERLAP_LINES=3)
│ └─ .md: H1/H2/H3 sectioning, fenced code extraction, table isolation,
│ narrative split by token budget (MAX_NARRATIVE_TOKENS=400)
│ ├─ MinHash LSH deduplication (threshold=0.85, 128 permutations)
│ └─ parallel workers (ProcessPoolExecutor)
▼ chunks.jsonl (one JSON per line)
▼ avap_ingestor.py (async producer/consumer)
│ ├─ OllamaAsyncEmbedder — batch embed (BATCH_SIZE_EMBED=8)
│ ├─ asyncio.Queue (backpressure, QUEUE_MAXSIZE=5)
│ ├─ ES async_bulk (BATCH_SIZE_ES=50)
│ └─ DeadLetterQueue — failed chunks saved to failed_chunks_<ts>.jsonl
▼ Elasticsearch index
{chunk_id, content, embedding, doc_type, block_type, section,
source_file, start_line, end_line, token_estimate, metadata{...}}
```
**Chunk types produced:**
| `doc_type` | `block_type` | Description |
|---|---|---|
| `code` | `function` | Complete AVAP function block |
| `code` | `if` / `startLoop` / `try` | Control flow blocks |
| `function_signature` | `function_signature` | Extracted function signature only (for fast lookup) |
| `code` | `registerEndpoint` / `addVar` / … | Statement-level chunks by AVAP command category |
| `spec` | `narrative` | Markdown prose sections |
| `code_example` | language tag | Fenced code blocks from markdown |
| `bnf` | `bnf` | BNF grammar blocks from markdown |
| `spec` | `table` | Markdown tables |
**Semantic tags** (automatically detected, stored in `metadata`):
`uses_orm` · `uses_http` · `uses_connector` · `uses_async` · `uses_crypto` · `uses_auth` · `uses_error_handling` · `uses_loop` · `uses_json` · `uses_list` · `uses_regex` · `uses_datetime` · `returns_result` · `registers_endpoint`
**Ingestor environment variables:**
| Variable | Default | Description |
|---|---|---|
| `OLLAMA_URL` | `http://localhost:11434` | Ollama base URL for embeddings |
| `OLLAMA_MODEL` | `qwen3-0.6B-emb:latest` | Embedding model name |
| `OLLAMA_EMBEDDING_DIM` | `1024` | Expected embedding dimension (must match model) |
---
## Development Setup
### 1. Prerequisites
@ -138,6 +316,9 @@ PYTHONPATH=${PYTHONPATH}:/home/...
ELASTICSEARCH_URL=http://host.docker.internal:9200
ELASTICSEARCH_LOCAL_URL=http://localhost:9200
ELASTICSEARCH_INDEX=avap-docs-test
ELASTICSEARCH_USER=elastic
ELASTICSEARCH_PASSWORD=changeme
ELASTICSEARCH_API_KEY=
POSTGRES_URL=postgresql://postgres:postgres@localhost:5432/langfuse
LANGFUSE_HOST=http://45.77.119.180
LANGFUSE_PUBLIC_KEY=pk-lf-...
@ -148,6 +329,8 @@ OLLAMA_MODEL_NAME=qwen2.5:1.5b
OLLAMA_EMB_MODEL_NAME=qwen3-0.6B-emb:latest
HF_TOKEN=hf_...
HF_EMB_MODEL_NAME=Qwen/Qwen3-Embedding-0.6B
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514
```
| Variable | Required | Description | Example |
@ -156,6 +339,9 @@ HF_EMB_MODEL_NAME=Qwen/Qwen3-Embedding-0.6B
| `ELASTICSEARCH_URL` | Yes | Elasticsearch endpoint used for vector/context retrieval in Docker | `http://host.docker.internal:9200` |
| `ELASTICSEARCH_LOCAL_URL` | Yes | Elasticsearch endpoint used for vector/context retrieval in local | `http://localhost:9200` |
| `ELASTICSEARCH_INDEX` | Yes | Elasticsearch index name used by the engine | `avap-docs-test` |
| `ELASTICSEARCH_USER` | No | Elasticsearch username (used when API key is not set) | `elastic` |
| `ELASTICSEARCH_PASSWORD` | No | Elasticsearch password (used when API key is not set) | `changeme` |
| `ELASTICSEARCH_API_KEY` | No | Elasticsearch API key (takes precedence over user/password auth) | `abc123...` |
| `POSTGRES_URL` | Yes | PostgreSQL connection string used by the service | `postgresql://postgres:postgres@localhost:5432/langfuse` |
| `LANGFUSE_HOST` | Yes | Langfuse server endpoint (Devaron Cluster) | `http://45.77.119.180` |
| `LANGFUSE_PUBLIC_KEY` | Yes | Langfuse project public key for tracing and observability | `pk-lf-...` |
@ -164,8 +350,10 @@ HF_EMB_MODEL_NAME=Qwen/Qwen3-Embedding-0.6B
| `OLLAMA_LOCAL_URL` | Yes | Ollama endpoint used for text generation/embeddings in local | `http://localhost:11434` |
| `OLLAMA_MODEL_NAME` | Yes | Ollama model name for generation | `qwen2.5:1.5b` |
| `OLLAMA_EMB_MODEL_NAME` | Yes | Ollama embeddings model name | `qwen3-0.6B-emb:latest` |
| `HF_TOKEN` | Yes | Hugginface secret token | `hf_...` |
| `HF_EMB_MODEL_NAME` | Yes | Hugginface embeddings model name | `Qwen/Qwen3-Embedding-0.6B` |
| `HF_TOKEN` | Yes | HuggingFace secret token | `hf_...` |
| `HF_EMB_MODEL_NAME` | Yes | HuggingFace embeddings model name | `Qwen/Qwen3-Embedding-0.6B` |
| `ANTHROPIC_API_KEY` | Yes* | Anthropic API key — required for the `EvaluateRAG` endpoint | `sk-ant-...` |
| `ANTHROPIC_MODEL` | No | Claude model used by the RAG evaluation suite | `claude-sonnet-4-20250514` |
> Never commit real secret values. Use placeholder values when sharing configuration examples.
@ -192,25 +380,186 @@ docker-compose up -d --build
## Testing & Debugging
The service is exposed on port `50052` with **gRPC Reflection** enabled.
The gRPC service is exposed on port `50052` with **gRPC Reflection** enabled — introspect it at any time without needing the `.proto` file.
```bash
# List available services
grpcurl -plaintext localhost:50052 list
# Describe the full service contract
grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
```
### `AskAgent` — complete response (non-streaming)
Returns the full answer as a single message with `is_final: true`. Suitable for clients that do not support streaming.
### Streaming Query Example
```bash
grpcurl -plaintext \
-d '{"query": "Hola Brunix, ¿qué es AVAP?", "session_id": "dev-test-123"}' \
-d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgent
```
Expected response:
```json
{
"text": "addVar is an AVAP command used to declare a variable...",
"avap_code": "AVAP-2026",
"is_final": true
}
```
### `AskAgentStream` — real token streaming
Emits one `AgentResponse` per token from Ollama. The final message has `is_final: true` and empty `text` — it is a termination signal, not part of the answer.
```bash
grpcurl -plaintext \
-d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgentStream
```
Expected response stream:
```json
{"text": "Here", "is_final": false}
{"text": " is", "is_final": false}
...
{"text": "", "is_final": true}
```
**Multi-turn conversation:** send subsequent requests with the same `session_id` to maintain context.
```bash
# Turn 1
grpcurl -plaintext \
-d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
# Turn 2 — engine has Turn 1 history
grpcurl -plaintext \
-d '{"query": "Show me a code example", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
```
### `EvaluateRAG` — quality evaluation
Runs the RAGAS evaluation pipeline against the golden dataset using Claude as the judge. Requires `ANTHROPIC_API_KEY` to be set.
```bash
# Full evaluation
grpcurl -plaintext -d '{}' localhost:50052 brunix.AssistanceEngine/EvaluateRAG
# Filtered: first 10 questions of category "core_syntax"
grpcurl -plaintext \
-d '{"category": "core_syntax", "limit": 10, "index": "avap-docs-test"}' \
localhost:50052 \
brunix.AssistanceEngine/EvaluateRAG
```
Expected response:
```json
{
"status": "ok",
"questions_evaluated": 10,
"elapsed_seconds": 142.3,
"judge_model": "claude-sonnet-4-20250514",
"faithfulness": 0.8421,
"answer_relevancy": 0.7913,
"context_recall": 0.7234,
"context_precision": 0.6891,
"global_score": 0.7615,
"verdict": "ACCEPTABLE"
}
```
Verdict thresholds: `EXCELLENT` ≥ 0.80 · `ACCEPTABLE` ≥ 0.60 · `INSUFFICIENT` < 0.60
---
## HTTP Proxy (OpenAI & Ollama Compatible)
The container also runs an **OpenAI-compatible HTTP proxy** on port `8000` (`openai_proxy.py`). It wraps the gRPC engine transparently — `stream: false` routes to `AskAgent`, `stream: true` routes to `AskAgentStream`.
This enables integration with any tool that supports the OpenAI or Ollama API (continue.dev, LiteLLM, Open WebUI, etc.) without code changes.
### OpenAI endpoints
| Method | Endpoint | Description |
|---|---|---|
| `GET` | `/v1/models` | List available models |
| `POST` | `/v1/chat/completions` | Chat completion — streaming and non-streaming |
| `POST` | `/v1/completions` | Legacy text completion — streaming and non-streaming |
| `GET` | `/health` | Health check — returns gRPC target and status |
**Non-streaming chat:**
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "What is AVAP?"}],
"stream": false
}'
```
**Streaming chat (SSE):**
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "Write an AVAP hello world API"}],
"stream": true,
"session_id": "user-xyz"
}'
```
> **Brunix extension:** `session_id` is a non-standard field added to the OpenAI schema. Use it to maintain multi-turn conversation context across HTTP requests. If omitted, all requests share the `"default"` session.
### Ollama endpoints
| Method | Endpoint | Description |
|---|---|---|
| `GET` | `/api/tags` | List models (Ollama format) |
| `POST` | `/api/chat` | Chat — NDJSON stream, `stream: true` by default |
| `POST` | `/api/generate` | Text generation — NDJSON stream, `stream: true` by default |
```bash
curl http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "Explain AVAP loops"}],
"stream": true
}'
```
### Proxy environment variables
| Variable | Default | Description |
|---|---|---|
| `BRUNIX_GRPC_TARGET` | `localhost:50051` | gRPC engine address the proxy connects to |
| `PROXY_MODEL_ID` | `brunix` | Model name returned in API responses |
| `PROXY_THREAD_WORKERS` | `20` | Thread pool size for concurrent gRPC calls |
---
## API Contract (Protobuf)
To update the communication interface, modify `protos/brunix.proto` and re-generate the stubs:
The source of truth for the gRPC interface is `Docker/protos/brunix.proto`. After modifying it, regenerate the stubs:
```bash
python -m grpc_tools.protoc -I./protos --python_out=./src --grpc_python_out=./src ./protos/brunix.proto
python -m grpc_tools.protoc \
-I./Docker/protos \
--python_out=./Docker/src \
--grpc_python_out=./Docker/src \
./Docker/protos/brunix.proto
```
For the full API reference — message types, field descriptions, error handling, and all client examples — see [`docs/API_REFERENCE.md`](./docs/API_REFERENCE.md).
---
## Dataset Generation & Evaluation
@ -218,7 +567,7 @@ python -m grpc_tools.protoc -I./protos --python_out=./src --grpc_python_out=./sr
The engine includes a specialized benchmarking suite to evaluate the model's proficiency in **AVAP syntax**. This is achieved through a synthetic data generator that creates problems in the MBPP (Mostly Basic Python Problems) style, but tailored for the AVAP Language Reference Manual (LRM).
### 1. Synthetic Data Generator
The script `scripts/generate_mbpp_avap.py` leverages Claude 3.5 Sonnet to produce high-quality, executable code examples and validation tests.
The script `scripts/pipelines/flows/generate_mbap.py` leverages Claude to produce high-quality, executable code examples and validation tests.
**Key Features:**
* **LRM Grounding:** Uses the provided `avap.md` as the source of truth for syntax and logic.
@ -236,8 +585,8 @@ export ANTHROPIC_API_KEY="your-sk-ant-key"
Run the generator specifying the path to your LRM and the desired output:
```bash
python scripts/generate_mbpp_avap.py \
--lrm ingestion/docs/avap.md \
python scripts/pipelines/flows/generate_mbap.py \
--lrm docs/LRM/avap.md \
--output evaluation/mbpp_avap.json \
--problems 300
```
@ -275,6 +624,21 @@ For the full set of contribution standards, see [CONTRIBUTING.md](./CONTRIBUTING
---
## Documentation Index
| Document | Purpose |
|---|---|
| [README.md](./README.md) | Setup guide, env vars reference, quick start (this file) |
| [CONTRIBUTING.md](./CONTRIBUTING.md) | Contribution standards, GitFlow, PR process |
| [SECURITY.md](./SECURITY.md) | Security policy, vulnerability reporting, known limitations |
| [docs/ARCHITECTURE.md](./docs/ARCHITECTURE.md) | Deep technical architecture, component inventory, data flows |
| [docs/API_REFERENCE.md](./docs/API_REFERENCE.md) | Complete gRPC API contract, message types, client examples |
| [docs/RUNBOOK.md](./docs/RUNBOOK.md) | Operational playbooks, health checks, incident response |
| [docs/AVAP_CHUNKER_CONFIG.md](./docs/AVAP_CHUNKER_CONFIG.md) | `avap_config.json` reference — blocks, statements, semantic tags, how to extend |
| [docs/adr/](./docs/adr/) | Architecture Decision Records |
---
## Security & Intellectual Property
* **Data Privacy:** All LLM processing and vector searches are conducted within a private Kubernetes environment.
* **Proprietary Technology:** This repository contains the **AVAP Technology** stack (101OBEX) and specialized training logic (MrHouston). Unauthorized distribution is prohibited.

View File

@ -4,6 +4,28 @@ All notable changes to the **Brunix Assistance Engine** will be documented in th
---
## [1.5.1] - 2026-03-18
### Added
- DOCS: Created `docs/ARCHITECTURE.md` — full technical architecture reference covering component inventory, request lifecycle, LangGraph workflow, hybrid RAG pipeline, streaming design, evaluation pipeline, infrastructure layout, session memory, observability, and security boundaries.
- DOCS: Created `docs/API_REFERENCE.md` — complete gRPC API contract documentation with method descriptions, message type tables, error handling, and `grpcurl` client examples for all three RPCs (`AskAgent`, `AskAgentStream`, `EvaluateRAG`).
- DOCS: Created `docs/RUNBOOK.md` — operational playbook with health checks, startup/shutdown procedures, tunnel management, and incident playbooks for all known failure modes.
- DOCS: Created `SECURITY.md` — security policy covering transport security, authentication, secrets management, container security, data privacy, known limitations table, and vulnerability reporting process.
- DOCS: Created `docs/AVAP_CHUNKER_CONFIG.md` — full reference for `avap_config.json`: lexer fields, all 4 block definitions with regex breakdown, all 10 statement categories with ordering rationale, all 14 semantic tags with detection patterns, a worked example showing chunks produced from real AVAP code, and a step-by-step guide for adding new constructs.
### Changed
- DOCS: Fully rewrote `README.md` project structure tree — now reflects all files accurately including `openai_proxy.py`, `entrypoint.sh`, `golden_dataset.json`, `SECURITY.md`, `docs/ARCHITECTURE.md`, `docs/API_REFERENCE.md`, `docs/RUNBOOK.md`, `docs/adr/`, `avap_chunker.py`, `avap_config.json`, `ingestion/chunks.jsonl`, and `src/config.py`.
- DOCS: Added `Knowledge Base Ingestion` section to `README.md` documenting both ingestion pipelines in full: Pipeline A (Chonkie — `elasticsearch_ingestion.py`) with flow diagram, CLI usage, and chunk field table; Pipeline B (AVAP Native — `avap_chunker.py` + `avap_ingestor.py`) with flow diagram, chunk type table, semantic tags reference, and ingestor env vars.
- DOCS: Replaced minimal `Testing & Debugging` section with complete documentation of all three gRPC methods (`AskAgent`, `AskAgentStream`, `EvaluateRAG`) including expected responses, multi-turn example, and verdict thresholds.
- DOCS: Added `HTTP Proxy` section documenting all 7 HTTP endpoints (4 OpenAI + 3 Ollama), streaming vs non-streaming routing, `session_id` extension, and proxy env vars table.
- DOCS: Fixed `API Contract (Protobuf)` section — corrected `grpc_tools.protoc` paths and added reference to `docs/API_REFERENCE.md`.
- DOCS: Fixed remaining stale reference to `scripts/generate_mbpp_avap.py` in Dataset Generation section.
- DOCS: Added Documentation Index table to `README.md` linking all documentation files.
- DOCS: Updated `CONTRIBUTING.md` — added Section 9 (Architecture Decision Records) and updated PR checklist and doc policy table.
- ENV: Added missing variable documentation to `README.md`: `ELASTICSEARCH_USER`, `ELASTICSEARCH_PASSWORD`, `ELASTICSEARCH_API_KEY`, `ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL`.
---
## [1.5.0] - 2026-03-12
### Added

View File

@ -0,0 +1,54 @@
# ADR-0001: gRPC as the Primary Communication Interface
**Date:** 2026-02-09
**Status:** Accepted
**Deciders:** Rafael Ruiz (CTO, AVAP Technology), MrHouston Engineering
---
## Context
The Brunix Assistance Engine needs a communication protocol to serve AI completions from internal backend services and client applications. The primary requirement is **real-time token streaming** — the engine must forward Ollama's token output to clients with minimal latency, not buffer the full response.
Secondary requirements:
- Strict API contract enforcement (no schema drift)
- High throughput for potential multi-client scenarios
- Easy introspection and testing in development
Candidates evaluated: REST/HTTP+JSON, gRPC, WebSockets, GraphQL subscriptions.
---
## Decision
Use **gRPC with Protocol Buffers (proto3)** as the primary interface, exposed on port `50051` (container) / `50052` (host).
The API contract is defined in a single source of truth: `Docker/protos/brunix.proto`.
An **OpenAI-compatible HTTP proxy** (`openai_proxy.py`, port `8000`) is provided as a secondary interface to enable integration with standard tooling (continue.dev, LiteLLM, etc.) without modifying the core engine.
---
## Rationale
| Criterion | REST+JSON | **gRPC** | WebSockets |
|---|---|---|---|
| Streaming support | Requires SSE or chunked | ✅ Native server-side streaming | ✅ Bidirectional |
| Schema enforcement | ❌ Optional (OpenAPI) | ✅ Enforced by protobuf | ❌ None |
| Code generation | Manual or OpenAPI tooling | ✅ Automatic stub generation | Manual |
| Performance | Good | ✅ Better (binary framing) | Good |
| Dev tooling | Excellent | Good (`grpcurl`, reflection) | Limited |
| Browser-native | ✅ Yes | ❌ Requires grpc-web proxy | ✅ Yes |
gRPC was chosen because: (1) streaming is a first-class citizen, not bolted on; (2) the proto contract makes API evolution explicit and breaking changes detectable at compile time; (3) stub generation eliminates a class of integration bugs.
The lack of browser-native support is not a concern — all current clients are server-side services or CLI tools.
---
## Consequences
- All API changes require modifying `brunix.proto` and regenerating stubs (`grpc_tools.protoc`).
- Client libraries must use the generated stubs or `grpcurl` — no curl-based ad-hoc testing of the main API.
- The OpenAI proxy adds a second entry point that must be kept in sync with the gRPC interface behavior.
- gRPC reflection is enabled in development. It should be evaluated for disabling in production to reduce the attack surface.

View File

@ -0,0 +1,61 @@
# ADR-0002: Two-Phase Streaming Design for `AskAgentStream`
**Date:** 2026-03-05
**Status:** Accepted
**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
---
## Context
The initial `AskAgent` implementation calls `graph.invoke()` — LangGraph's synchronous execution — and returns the complete answer as a single gRPC message. This blocks the gRPC connection for the full generation time (typically 315 seconds) with no intermediate feedback to the client.
A streaming variant is required that forwards Ollama's token output to the client as tokens are produced, enabling real-time rendering in client UIs.
The straightforward approach would be to use LangGraph's own `graph.stream()` method.
---
## Decision
Implement `AskAgentStream` using a **two-phase design**:
**Phase 1 — Graph-managed preparation:**
Run `build_prepare_graph()` (classify → reformulate → retrieve) via `prepare_graph.invoke()`. This phase runs synchronously and produces the full classified, reformulated query and retrieved context. It does **not** call the LLM for generation.
**Phase 2 — Manual LLM streaming:**
Call `build_final_messages()` to reconstruct the exact prompt that the full graph would have used, then call `llm.stream(final_messages)` directly. Each token chunk is yielded immediately as an `AgentResponse`.
A separate `build_prepare_graph()` function mirrors the routing logic of `build_graph()` but terminates at `END` before any generation node. A `build_final_messages()` function replicates the prompt-building logic of `generate`, `generate_code`, and `respond_conversational`.
---
## Rationale
### Why not use `graph.stream()`?
LangGraph's `stream()` yields **state snapshots** at node boundaries, not LLM tokens. When using `llm.invoke()` inside a graph node, the invocation is atomic — there are no intermediate yields. To get per-token streaming from `llm.stream()`, the call must happen outside the graph.
### Why not inline the streaming call inside a graph node?
Yielding from inside a LangGraph node to an outer generator is architecturally complex and not idiomatic to LangGraph. It requires either a callback mechanism or breaking the node abstraction.
### Trade-offs
| Concern | Two-phase design | Alternative (streaming inside graph) |
|---|---|---|
| Code duplication | Medium — routing logic exists in both graphs | Low |
| Architectural clarity | High — phases are clearly separated | Low |
| LangGraph compatibility | High — standard usage | Low — requires framework internals |
| Maintainability | Requires keeping `build_prepare_graph` and `build_final_messages` in sync with `build_graph` | Single source of routing truth |
The duplication risk is accepted because: (1) the routing logic is simple (3 branches), (2) the prepare graph is strictly a subset of the full graph, and (3) both are tested via the same integration test queries.
---
## Consequences
- `graph.py` now exports three functions: `build_graph`, `build_prepare_graph`, `build_final_messages`.
- Any change to query routing logic in `build_graph` must be mirrored in `build_prepare_graph`.
- Any change to prompt selection in `generate` / `generate_code` / `respond_conversational` must be mirrored in `build_final_messages`.
- Session history persistence happens **after the stream ends**, not mid-stream. A client that disconnects early will cause history to not be saved for that turn.

View File

@ -0,0 +1,63 @@
# ADR-0003: Hybrid Retrieval (BM25 + kNN) with RRF Fusion
**Date:** 2026-03-05
**Status:** Accepted
**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
---
## Context
The RAG pipeline needs a retrieval strategy for finding relevant AVAP documentation chunks from Elasticsearch. The knowledge base contains a mix of:
- **Prose documentation** (explanations of AVAP concepts, commands, parameters) — benefits from semantic (dense) retrieval
- **Code examples and BNF grammar** (exact syntax patterns, function signatures) — benefits from lexical (sparse) retrieval, where exact token matches are critical
A single retrieval strategy will underperform for one of these document types.
---
## Decision
Implement **hybrid retrieval** combining:
- **BM25** (Elasticsearch `multi_match` on `content^2` and `text^2` fields) for lexical relevance
- **kNN** (Elasticsearch `knn` on the `embedding` field) for semantic relevance
- **RRF (Reciprocal Rank Fusion)** with constant `k=60` to fuse rankings from both systems
The fused top-8 documents are passed to the generation node as context.
Query reformulation (`reformulate` node) runs before retrieval and rewrites the user query into keyword-optimized form to improve BM25 recall for AVAP-specific terminology.
---
## Rationale
### Why hybrid over pure semantic?
AVAP is a domain-specific language with precise, non-negotiable syntax. For queries like "how does `addVar` work", exact lexical matching on the function name `addVar` is more reliable than semantic similarity, which may confuse similar-sounding functions or return contextually related but syntactically different commands.
### Why hybrid over pure BM25?
Conversational queries ("explain how loops work in AVAP", "what's the difference between addVar and setVar") benefit from semantic search that captures meaning beyond exact keyword overlap.
### Why RRF over score normalization?
BM25 and kNN scores are on different scales and distributions. Normalizing them requires careful calibration per index. RRF operates on ranks — not scores — making it robust to distribution differences and requiring no per-deployment tuning. The `k=60` constant is the standard literature value.
### Retrieval parameters
| Parameter | Value | Rationale |
|---|---|---|
| `k` (top documents) | 8 | Balances context richness vs. context window length |
| `num_candidates` (kNN) | `k × 5 = 40` | Standard ES kNN oversampling ratio |
| BM25 fields | `content^2, text^2` | Boost content/text fields; `^2` emphasizes them over metadata |
| Fuzziness (BM25) | `AUTO` | Handles minor typos in AVAP function names |
---
## Consequences
- Retrieval requires two ES queries per request (BM25 + kNN). This is acceptable given the tunnel latency baseline already incurred.
- If either BM25 or kNN fails (e.g., embedding model unavailable), the system degrades gracefully: the failing component logs a warning and returns an empty list; RRF fusion proceeds with the available rankings.
- Context length grows with `k`. At `k=8` with typical chunk sizes (~300 tokens each), context is ~2400 tokens — within the `qwen2.5:1.5b` context window.
- Changing `k` has a direct impact on both retrieval quality and generation latency. Any change must be evaluated with `EvaluateRAG` before merging.

View File

@ -0,0 +1,54 @@
# ADR-0004: Claude as the RAGAS Evaluation Judge
**Date:** 2026-03-10
**Status:** Accepted
**Deciders:** Rafael Ruiz (CTO), MrHouston Engineering
---
## Context
The `EvaluateRAG` endpoint runs RAGAS metrics to measure the quality of the RAG pipeline. RAGAS metrics (`faithfulness`, `answer_relevancy`, `context_recall`, `context_precision`) require an LLM judge to score answers against ground truth and context.
The production LLM is Ollama `qwen2.5:1.5b` — a small, locally-hosted model optimized for AVAP code generation speed. Using it as the evaluation judge creates a conflict of interest (measuring a system with the same model that produces it) and a quality concern (small models produce unreliable evaluation scores).
---
## Decision
Use **Claude (`claude-sonnet-4-20250514`) as the RAGAS evaluation judge**, accessed via the Anthropic API.
The production Ollama LLM is still used for **answer generation** during evaluation (to measure real-world pipeline quality). Only the scoring step uses Claude.
This requires `ANTHROPIC_API_KEY` to be set. The `EvaluateRAG` endpoint fails with an explicit error if the key is missing.
---
## Rationale
### Separation of generation and evaluation
Using a different model for generation and evaluation is standard practice in LLM system evaluation. The evaluation judge must be:
1. **Independent** — not the same model being measured
2. **High-capability** — capable of nuanced faithfulness and relevancy judgements
3. **Deterministic** — consistent scores across runs (achieved via `temperature=0`)
### Why Claude specifically?
- Claude Sonnet-class models score among the highest on LLM-as-judge benchmarks for English and multilingual evaluation tasks
- The AVAP knowledge base contains bilingual content (Spanish + English); Claude handles both reliably
- The Anthropic SDK is already available in the dependency stack (`langchain-anthropic`)
### Cost implications
Claude is called only during explicit `EvaluateRAG` invocations, not during production queries. Cost per evaluation run depends on dataset size. For 50 questions at standard RAGAS prompt lengths, estimated cost is < $0.50 using Sonnet pricing.
---
## Consequences
- `ANTHROPIC_API_KEY` and `ANTHROPIC_MODEL` become required configuration for the evaluation feature.
- Evaluation runs incur external API costs. This should be factored into the evaluation cadence policy.
- The `judge_model` field in `EvalResponse` records which Claude version was used, enabling score comparisons across model versions over time.
- If Anthropic's API is unreachable or rate-limited, `EvaluateRAG` will fail. This is acceptable since evaluation is a batch operation, not a real-time user-facing feature.
- Any change to `ANTHROPIC_MODEL` may alter scoring distributions. Historical eval scores are only comparable when the same judge model was used.

339
docs/API_REFERENCE.md Normal file
View File

@ -0,0 +1,339 @@
# Brunix Assistance Engine — API Reference
> **Protocol:** gRPC (proto3)
> **Port:** `50052` (host) → `50051` (container)
> **Reflection:** Enabled — service introspection available via `grpcurl`
> **Source of truth:** `Docker/protos/brunix.proto`
---
## Table of Contents
1. [Service Definition](#1-service-definition)
2. [Methods](#2-methods)
- [AskAgent](#21-askagent)
- [AskAgentStream](#22-askagentstream)
- [EvaluateRAG](#23-evaluaterag)
3. [Message Types](#3-message-types)
4. [Error Handling](#4-error-handling)
5. [Client Examples](#5-client-examples)
6. [OpenAI-Compatible Proxy](#6-openai-compatible-proxy)
---
## 1. Service Definition
```protobuf
package brunix;
service AssistanceEngine {
rpc AskAgent (AgentRequest) returns (stream AgentResponse);
rpc AskAgentStream (AgentRequest) returns (stream AgentResponse);
rpc EvaluateRAG (EvalRequest) returns (EvalResponse);
}
```
Both `AskAgent` and `AskAgentStream` return a **server-side stream** of `AgentResponse` messages. They differ in how they produce and deliver the response — see [§2.1](#21-askagent) and [§2.2](#22-askagentstream).
---
## 2. Methods
### 2.1 `AskAgent`
**Behaviour:** Runs the full LangGraph pipeline (classify → reformulate → retrieve → generate) using `llm.invoke()`. Returns the complete answer as a **single** `AgentResponse` message with `is_final = true`.
**Use case:** Clients that do not support streaming or need a single atomic response.
**Request:**
```protobuf
message AgentRequest {
string query = 1; // The user's question. Required. Max recommended: 4096 chars.
string session_id = 2; // Conversation session identifier. Optional.
// If empty, defaults to "default" (shared session).
// Use a UUID per user/conversation for isolation.
}
```
**Response stream:**
| Message # | `text` | `avap_code` | `is_final` |
|---|---|---|---|
| 1 (only) | Full answer text | `"AVAP-2026"` | `true` |
**Latency characteristics:** Depends on LLM generation time (non-streaming). Typically 315 seconds for `qwen2.5:1.5b` on the Devaron cluster.
---
### 2.2 `AskAgentStream`
**Behaviour:** Runs `prepare_graph` (classify → reformulate → retrieve), then calls `llm.stream()` directly. Emits one `AgentResponse` per token from Ollama, followed by a terminal message.
**Use case:** Interactive clients (chat UIs, terminal tools) that need progressive rendering.
**Request:** Same `AgentRequest` as `AskAgent`.
**Response stream:**
| Message # | `text` | `avap_code` | `is_final` |
|---|---|---|---|
| 1…N | Single token | `""` | `false` |
| N+1 (final) | `""` | `""` | `true` |
**Client contract:**
- Accumulate `text` from all messages where `is_final == false` to reconstruct the full answer.
- The `is_final == true` message signals end-of-stream. Its `text` is always empty and should be discarded.
- Do not close the stream early — the engine will fail to persist conversation history if the stream is interrupted.
---
### 2.3 `EvaluateRAG`
**Behaviour:** Runs the RAGAS evaluation pipeline against the golden dataset. Uses the production Ollama LLM for answer generation and Claude as the evaluation judge.
> **Requirement:** `ANTHROPIC_API_KEY` must be configured in the environment. This endpoint will return an error response if it is missing.
**Request:**
```protobuf
message EvalRequest {
string category = 1; // Optional. Filter golden dataset by category name.
// If empty, all categories are evaluated.
int32 limit = 2; // Optional. Evaluate only the first N questions.
// If 0, all matching questions are evaluated.
string index = 3; // Optional. Elasticsearch index to evaluate against.
// If empty, uses the server's configured ELASTICSEARCH_INDEX.
}
```
**Response (single, non-streaming):**
```protobuf
message EvalResponse {
string status = 1; // "ok" or error description
int32 questions_evaluated = 2; // Number of questions actually processed
float elapsed_seconds = 3; // Total wall-clock time
string judge_model = 4; // Claude model used as judge
string index = 5; // Elasticsearch index evaluated
// RAGAS metric scores (0.0 1.0)
float faithfulness = 6;
float answer_relevancy = 7;
float context_recall = 8;
float context_precision = 9;
float global_score = 10; // Mean of non-zero metric scores
string verdict = 11; // "EXCELLENT" | "ACCEPTABLE" | "INSUFFICIENT"
repeated QuestionDetail details = 12;
}
message QuestionDetail {
string id = 1; // Question ID from golden dataset
string category = 2; // Question category
string question = 3; // Question text
string answer_preview = 4; // First 300 chars of generated answer
int32 n_chunks = 5; // Number of context chunks retrieved
}
```
**Verdict thresholds:**
| Score | Verdict |
|---|---|
| ≥ 0.80 | `EXCELLENT` |
| ≥ 0.60 | `ACCEPTABLE` |
| < 0.60 | `INSUFFICIENT` |
---
## 3. Message Types
### `AgentRequest`
| Field | Type | Required | Description |
|---|---|---|---|
| `query` | `string` | Yes | User's natural language question |
| `session_id` | `string` | No | Conversation identifier for multi-turn context. Use a stable UUID per user session. |
### `AgentResponse`
| Field | Type | Description |
|---|---|---|
| `text` | `string` | Token text (streaming) or full answer text (non-streaming) |
| `avap_code` | `string` | Currently always `"AVAP-2026"` in non-streaming mode, empty in streaming |
| `is_final` | `bool` | `true` only on the last message of the stream |
### `EvalRequest`
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `category` | `string` | No | `""` (all) | Filter golden dataset by category |
| `limit` | `int32` | No | `0` (all) | Max questions to evaluate |
| `index` | `string` | No | `$ELASTICSEARCH_INDEX` | ES index to evaluate |
### `EvalResponse`
See full definition in [§2.3](#23-evaluaterag).
---
## 4. Error Handling
The engine catches all exceptions and returns them as terminal `AgentResponse` messages rather than gRPC status errors. This means:
- The stream will **not** be terminated with a non-OK gRPC status code on application-level errors.
- Check for error strings in the `text` field that begin with `[ENG] Error:`.
- The stream will still end with `is_final = true`.
**Example error response:**
```json
{"text": "[ENG] Error: Connection refused connecting to Ollama", "is_final": true}
```
**`EvaluateRAG` error response:**
Returned as a single `EvalResponse` with `status` set to the error description:
```json
{"status": "ANTHROPIC_API_KEY no configurada en .env", ...}
```
---
## 5. Client Examples
### Introspect the service
```bash
grpcurl -plaintext localhost:50052 list
# Output: brunix.AssistanceEngine
grpcurl -plaintext localhost:50052 describe brunix.AssistanceEngine
```
### `AskAgent` — full response
```bash
grpcurl -plaintext \
-d '{"query": "What is addVar in AVAP?", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgent
```
Expected response:
```json
{
"text": "addVar is an AVAP command that declares a new variable...",
"avap_code": "AVAP-2026",
"is_final": true
}
```
### `AskAgentStream` — token streaming
```bash
grpcurl -plaintext \
-d '{"query": "Write an AVAP API that returns hello world", "session_id": "dev-001"}' \
localhost:50052 \
brunix.AssistanceEngine/AskAgentStream
```
Expected response (truncated):
```json
{"text": "Here", "is_final": false}
{"text": " is", "is_final": false}
{"text": " a", "is_final": false}
...
{"text": "", "is_final": true}
```
### `EvaluateRAG` — run evaluation
```bash
# Evaluate first 10 questions from the "core_syntax" category
grpcurl -plaintext \
-d '{"category": "core_syntax", "limit": 10}' \
localhost:50052 \
brunix.AssistanceEngine/EvaluateRAG
```
Expected response:
```json
{
"status": "ok",
"questions_evaluated": 10,
"elapsed_seconds": 142.3,
"judge_model": "claude-sonnet-4-20250514",
"index": "avap-docs-test",
"faithfulness": 0.8421,
"answer_relevancy": 0.7913,
"context_recall": 0.7234,
"context_precision": 0.6891,
"global_score": 0.7615,
"verdict": "ACCEPTABLE",
"details": [...]
}
```
### Multi-turn conversation example
```bash
# Turn 1
grpcurl -plaintext \
-d '{"query": "What is registerEndpoint?", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
# Turn 2 — the engine has history from Turn 1
grpcurl -plaintext \
-d '{"query": "Can you show me an example?", "session_id": "user-abc"}' \
localhost:50052 brunix.AssistanceEngine/AskAgentStream
```
### Regenerate gRPC stubs after modifying `brunix.proto`
```bash
python -m grpc_tools.protoc \
-I./Docker/protos \
--python_out=./Docker/src \
--grpc_python_out=./Docker/src \
./Docker/protos/brunix.proto
```
---
## 6. OpenAI-Compatible Proxy
The container also exposes an HTTP server on port `8000` (`openai_proxy.py`) that wraps `AskAgentStream` under an OpenAI-compatible endpoint. This allows integration with any tool that supports the OpenAI Chat Completions API.
**Base URL:** `http://localhost:8000`
### `POST /v1/chat/completions`
**Request body:**
```json
{
"model": "brunix",
"messages": [
{"role": "user", "content": "What is addVar in AVAP?"}
],
"stream": true
}
```
**Notes:**
- The `model` field is ignored; the engine always uses the configured `OLLAMA_MODEL_NAME`.
- Session management is handled internally by the proxy. Conversation continuity across separate HTTP requests is not guaranteed.
- Only `stream: true` is fully supported. Non-streaming mode may be available but is not the primary use case.
**Example with curl:**
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "brunix",
"messages": [{"role": "user", "content": "Explain AVAP loops"}],
"stream": true
}'
```

463
docs/ARCHITECTURE.md Normal file
View File

@ -0,0 +1,463 @@
# Brunix Assistance Engine — Architecture Reference
> **Audience:** Engineers contributing to this repository, architects reviewing the system design, and operators responsible for its deployment.
> **Last updated:** 2026-03-18
> **Version:** 1.5.x
---
## Table of Contents
1. [System Overview](#1-system-overview)
2. [Component Inventory](#2-component-inventory)
3. [Request Lifecycle](#3-request-lifecycle)
4. [LangGraph Workflow](#4-langgraph-workflow)
5. [RAG Pipeline — Hybrid Search](#5-rag-pipeline--hybrid-search)
6. [Streaming Architecture (AskAgentStream)](#6-streaming-architecture-askagentstream)
7. [Evaluation Pipeline (EvaluateRAG)](#7-evaluation-pipeline-evaluaterag)
8. [Data Ingestion Pipeline](#8-data-ingestion-pipeline)
9. [Infrastructure Layout](#9-infrastructure-layout)
10. [Session State & Conversation Memory](#10-session-state--conversation-memory)
11. [Observability Stack](#11-observability-stack)
12. [Security Boundaries](#12-security-boundaries)
13. [Known Limitations & Future Work](#13-known-limitations--future-work)
---
## 1. System Overview
The **Brunix Assistance Engine** is a stateful, streaming-capable AI service that answers questions about the AVAP programming language. It combines:
- **gRPC** as the primary communication interface (port `50051` inside container, `50052` on host)
- **LangGraph** for deterministic, multi-step agentic orchestration
- **Hybrid RAG** (BM25 + kNN with RRF fusion) over an Elasticsearch vector index
- **Ollama** as the local LLM and embedding backend
- **RAGAS + Claude** as the automated evaluation judge
A secondary **OpenAI-compatible HTTP proxy** (port `8000`) is served via FastAPI/Uvicorn, enabling integration with tools that expect the OpenAI API format.
```
┌─────────────────────────────────────────────────────────────┐
│ External Clients │
│ grpcurl / App SDK │ OpenAI-compatible client │
└────────────┬────────────────┴──────────────┬────────────────┘
│ gRPC :50052 │ HTTP :8000
▼ ▼
┌────────────────────────────────────────────────────────────┐
│ Docker Container │
│ │
│ ┌─────────────────────┐ ┌──────────────────────────┐ │
│ │ server.py (gRPC) │ │ openai_proxy.py (HTTP) │ │
│ │ BrunixEngine │ │ FastAPI / Uvicorn │ │
│ └──────────┬──────────┘ └──────────────────────────┘ │
│ │ │
│ ┌──────────▼──────────────────────────────────────────┐ │
│ │ LangGraph Orchestration │ │
│ │ classify → reformulate → retrieve → generate │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌───────────────────┼────────────────────┐ │
│ ▼ ▼ ▼ │
│ Ollama (LLM) Ollama (Embed) Elasticsearch │
│ via tunnel via tunnel via tunnel │
└────────────────────────────────────────────────────────────┘
│ kubectl port-forward tunnels │
▼ ▼
Devaron Cluster (Vultr Kubernetes)
ollama-light-service:11434 brunix-vector-db:9200
brunix-postgres:5432 Langfuse UI
```
---
## 2. Component Inventory
| Component | File / Service | Responsibility |
|---|---|---|
| **gRPC Server** | `Docker/src/server.py` | Entry point. Implements the `AssistanceEngine` servicer. Initializes LLM, embeddings, ES client, and both graphs. |
| **Full Graph** | `Docker/src/graph.py``build_graph()` | Complete workflow: classify → reformulate → retrieve → generate. Used by `AskAgent` and `EvaluateRAG`. |
| **Prepare Graph** | `Docker/src/graph.py``build_prepare_graph()` | Partial workflow: classify → reformulate → retrieve. Does **not** call the LLM for generation. Used by `AskAgentStream` to enable manual token streaming. |
| **Message Builder** | `Docker/src/graph.py``build_final_messages()` | Reconstructs the final prompt list from prepared state for `llm.stream()`. |
| **Prompt Library** | `Docker/src/prompts.py` | Centralized definitions for `CLASSIFY`, `REFORMULATE`, `GENERATE`, `CODE_GENERATION`, and `CONVERSATIONAL` prompts. |
| **Agent State** | `Docker/src/state.py` | `AgentState` TypedDict shared across all graph nodes. |
| **Evaluation Suite** | `Docker/src/evaluate.py` | RAGAS-based pipeline. Uses the production retriever + Ollama LLM for generation, and Claude as the impartial judge. |
| **OpenAI Proxy** | `Docker/src/openai_proxy.py` | FastAPI application that wraps `AskAgentStream` under an `/v1/chat/completions` endpoint. |
| **LLM Factory** | `Docker/src/utils/llm_factory.py` | Provider-agnostic factory for chat models (Ollama, AWS Bedrock). |
| **Embedding Factory** | `Docker/src/utils/emb_factory.py` | Provider-agnostic factory for embedding models (Ollama, HuggingFace). |
| **Ingestion Pipeline** | `scripts/pipelines/flows/elasticsearch_ingestion.py` | Chunks and ingests AVAP documents into Elasticsearch with embeddings. |
| **Dataset Generator** | `scripts/pipelines/flows/generate_mbap.py` | Generates synthetic MBPP-style AVAP problems using Claude. |
| **MBPP Translator** | `scripts/pipelines/flows/translate_mbpp.py` | Translates MBPP Python dataset into AVAP equivalents. |
---
## 3. Request Lifecycle
### 3.1 `AskAgent` (non-streaming)
```
Client → gRPC AgentRequest{query, session_id}
├─ Load conversation history from session_store[session_id]
├─ Build initial_state = {messages: history + [user_msg], ...}
└─ graph.invoke(initial_state)
├─ classify → query_type ∈ {RETRIEVAL, CODE_GENERATION, CONVERSATIONAL}
├─ reformulate → reformulated_query (keyword-optimized for semantic search)
├─ retrieve → context (top-8 hybrid RRF chunks from Elasticsearch)
└─ generate → final AIMessage (llm.invoke)
├─ Persist updated history to session_store[session_id]
└─ yield AgentResponse{text, avap_code="AVAP-2026", is_final=True}
```
### 3.2 `AskAgentStream` (token streaming)
```
Client → gRPC AgentRequest{query, session_id}
├─ Load history from session_store[session_id]
├─ Build initial_state
├─ prepare_graph.invoke(initial_state) ← Phase 1: no LLM generation
│ ├─ classify
│ ├─ reformulate
│ └─ retrieve (or skip_retrieve if CONVERSATIONAL)
├─ build_final_messages(prepared_state) ← Reconstruct prompt list
└─ for chunk in llm.stream(final_messages):
└─ yield AgentResponse{text=token, is_final=False}
├─ Persist full assembled response to session_store
└─ yield AgentResponse{text="", is_final=True}
```
### 3.3 `EvaluateRAG`
```
Client → gRPC EvalRequest{category?, limit?, index?}
└─ evaluate.run_evaluation(...)
├─ Load golden_dataset.json
├─ Filter by category / limit
├─ For each question:
│ ├─ retrieve_context (hybrid BM25+kNN, same as production)
│ └─ generate_answer (Ollama LLM + GENERATE_PROMPT)
├─ Build RAGAS Dataset
├─ Run RAGAS metrics with Claude as judge:
│ faithfulness / answer_relevancy / context_recall / context_precision
└─ Compute global_score + verdict (EXCELLENT / ACCEPTABLE / INSUFFICIENT)
└─ return EvalResponse{scores, global_score, verdict, details[]}
```
---
## 4. LangGraph Workflow
### 4.1 Full Graph (`build_graph`)
```
┌─────────────┐
│ classify │
└──────┬──────┘
┌────────────────┼──────────────────┐
▼ ▼ ▼
RETRIEVAL CODE_GENERATION CONVERSATIONAL
│ │ │
└────────┬───────┘ │
▼ ▼
┌──────────────┐ ┌────────────────────────┐
│ reformulate │ │ respond_conversational │
└──────┬───────┘ └───────────┬────────────┘
▼ │
┌──────────────┐ │
│ retrieve │ │
└──────┬───────┘ │
│ │
┌────────┴───────────┐ │
▼ ▼ │
┌──────────┐ ┌───────────────┐ │
│ generate │ │ generate_code │ │
└────┬─────┘ └───────┬───────┘ │
│ │ │
└────────────────────┴────────────────┘
END
```
### 4.2 Prepare Graph (`build_prepare_graph`)
Identical routing for classify, but generation nodes are replaced by `END`. The `CONVERSATIONAL` branch uses `skip_retrieve` (returns empty context without querying Elasticsearch).
### 4.3 Query Type Routing
| `query_type` | Triggers retrieve? | Generation prompt |
|---|---|---|
| `RETRIEVAL` | Yes | `GENERATE_PROMPT` (explanation-focused) |
| `CODE_GENERATION` | Yes | `CODE_GENERATION_PROMPT` (code-focused, returns AVAP blocks) |
| `CONVERSATIONAL` | No | `CONVERSATIONAL_PROMPT` (reformulation of prior answer) |
---
## 5. RAG Pipeline — Hybrid Search
The retrieval system (`hybrid_search_native`) fuses BM25 lexical search and kNN dense vector search using **Reciprocal Rank Fusion (RRF)**.
```
User query
├─ embeddings.embed_query(query) → query_vector [768-dim]
├─ ES multi_match (BM25) on fields [content^2, text^2]
│ └─ top-k BM25 hits
└─ ES knn on field [embedding], num_candidates = k×5
└─ top-k kNN hits
├─ RRF fusion: score(doc) = Σ 1/(rank + 60)
└─ Top-8 documents → format_context() → context string
```
**RRF constant:** `60` (standard value; prevents high-rank documents from dominating while still rewarding consensus between both retrieval modes).
**Chunk metadata** attached to each retrieved document:
| Field | Description |
|---|---|
| `chunk_id` | Unique identifier within the index |
| `source_file` | Origin document filename |
| `doc_type` | `prose`, `code`, `code_example`, `bnf` |
| `block_type` | AVAP block type: `function`, `if`, `startLoop`, `try` |
| `section` | Document section/chapter heading |
Documents of type `code`, `code_example`, `bnf`, or block type `function / if / startLoop / try` are tagged as `[AVAP CODE]` in the formatted context, signaling the LLM to treat them as executable syntax rather than prose.
---
## 6. Streaming Architecture (AskAgentStream)
The two-phase streaming design is critical to understand:
**Why not stream through LangGraph?**
LangGraph's `stream()` method yields full state snapshots per node, not individual tokens. To achieve true per-token streaming to the gRPC client, the generation step is deliberately extracted from the graph and called directly via `llm.stream()`.
**Phase 1 — Deterministic preparation (graph-managed):**
- Classification, query reformulation, and retrieval run through `prepare_graph.invoke()`.
- This phase runs synchronously and produces the complete context before any token is emitted to the client.
**Phase 2 — Token streaming (manual):**
- `build_final_messages()` reconstructs the exact prompt that `generate` / `generate_code` / `respond_conversational` would have used.
- `llm.stream(final_messages)` yields one `AIMessageChunk` per token from Ollama.
- Each token is immediately forwarded to the gRPC client as `AgentResponse{text=token, is_final=False}`.
- After the stream ends, the full assembled text is persisted to `session_store`.
**Backpressure:** gRPC streaming is flow-controlled by the client. If the client stops reading, the Ollama token stream will block at the `yield` point. No explicit buffer overflow protection is implemented (acceptable for the current single-client dev mode).
---
## 7. Evaluation Pipeline (EvaluateRAG)
The evaluation suite implements an **offline RAG evaluation** pattern using RAGAS metrics.
### Judge model separation
The production LLM (Ollama `qwen2.5:1.5b`) is used for **answer generation** — the same pipeline as production to measure real-world quality. Claude (`claude-sonnet-4-20250514`) is used as the **evaluation judge** — an independent, high-capability model that scores the generated answers against ground truth.
### RAGAS metrics
| Metric | Measures | Input |
|---|---|---|
| `faithfulness` | Are claims in the answer supported by the retrieved context? | answer + contexts |
| `answer_relevancy` | Is the answer relevant to the question? | answer + question |
| `context_recall` | Does the retrieved context cover the ground truth? | contexts + ground_truth |
| `context_precision` | Are the retrieved chunks useful (signal-to-noise)? | contexts + ground_truth |
### Global score & verdict
```
global_score = mean(non-zero metric scores)
verdict:
≥ 0.80 → EXCELLENT
≥ 0.60 → ACCEPTABLE
< 0.60 INSUFFICIENT
```
### Golden dataset
Located at `Docker/src/golden_dataset.json`. Each entry follows this schema:
```json
{
"id": "avap-001",
"category": "core_syntax",
"question": "How do you declare a variable in AVAP?",
"ground_truth": "Use addVar to declare a variable..."
}
```
---
## 8. Data Ingestion Pipeline
Documents flow into the Elasticsearch index through two paths:
### Path A — AVAP documentation (structured markdown)
```
docs/LRM/avap.md
docs/avap_language_github_docs/*.md
docs/developer.avapframework.com/*.md
scripts/pipelines/flows/elasticsearch_ingestion.py
├─ Load markdown files
├─ Chunk using scripts/pipelines/tasks/chunk.py
│ (semantic chunking via Chonkie library)
├─ Generate embeddings via scripts/pipelines/tasks/embeddings.py
│ (Ollama or HuggingFace embedding model)
└─ Bulk index into Elasticsearch
index: avap-docs-* (configurable via ELASTICSEARCH_INDEX)
mapping: {content, embedding, source_file, doc_type, section, ...}
```
### Path B — Synthetic AVAP code samples
```
docs/samples/*.avap
scripts/pipelines/flows/generate_mbap.py
├─ Read AVAP LRM (docs/LRM/avap.md)
├─ Call Claude API to generate MBPP-style problems
└─ Output synthetic_datasets/mbpp_avap.json
(used for fine-tuning and few-shot examples)
```
---
## 9. Infrastructure Layout
### Devaron Cluster (Vultr Kubernetes)
| Service | K8s Name | Port | Purpose |
|---|---|---|---|
| LLM inference | `ollama-light-service` | `11434` | Text generation + embeddings |
| Vector database | `brunix-vector-db` | `9200` | Elasticsearch 8.x |
| Observability DB | `brunix-postgres` | `5432` | PostgreSQL for Langfuse |
| Langfuse UI | — | `80` | `http://45.77.119.180` |
### Kubernetes tunnel commands
```bash
# Terminal 1 — LLM
kubectl port-forward --address 0.0.0.0 svc/ollama-light-service 11434:11434 \
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
# Terminal 2 — Elasticsearch
kubectl port-forward --address 0.0.0.0 svc/brunix-vector-db 9200:9200 \
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
# Terminal 3 — PostgreSQL (Langfuse)
kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 \
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
```
### Port map summary
| Port | Protocol | Service | Scope |
|---|---|---|---|
| `50051` | gRPC | Brunix Engine (inside container) | Internal |
| `50052` | gRPC | Brunix Engine (host-mapped) | External |
| `8000` | HTTP | OpenAI proxy | External |
| `11434` | HTTP | Ollama (via tunnel) | Tunnel |
| `9200` | HTTP | Elasticsearch (via tunnel) | Tunnel |
| `5432` | TCP | PostgreSQL/Langfuse (via tunnel) | Tunnel |
---
## 10. Session State & Conversation Memory
Conversation history is managed via an in-process dictionary:
```python
session_store: dict[str, list] = defaultdict(list)
# key: session_id (string, provided by client)
# value: list of LangChain BaseMessage objects
```
**Characteristics:**
- **In-memory only.** History is lost on container restart.
- **No TTL or eviction.** Sessions grow unbounded for the lifetime of the process.
- **Thread safety:** Python's GIL provides basic safety for the `ThreadPoolExecutor(max_workers=10)` gRPC server, but concurrent writes to the same `session_id` from two simultaneous requests are not explicitly protected.
- **History window:** `format_history_for_classify()` uses only the last 6 messages for query classification to keep the classify prompt short and deterministic.
> **Future work:** Replace `session_store` with a Redis-backed persistent store to survive restarts and support horizontal scaling.
---
## 11. Observability Stack
### Langfuse tracing
The server integrates Langfuse for end-to-end LLM tracing. Every `AskAgent` / `AskAgentStream` request creates a trace that captures:
- Input query and session ID
- Each LangGraph node execution (classify, reformulate, retrieve, generate)
- LLM token counts, latency, and cost
- Final response
**Access:** `http://45.77.119.180` — requires a project API key configured via `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY`.
### Logging
Structured logging via Python's `logging` module, configured at `INFO` level. Log format:
```
[MODULE] context_info — key=value key=value
```
Key log markers:
| Marker | Module | Meaning |
|---|---|---|
| `[ESEARCH]` | `server.py` | Elasticsearch connection status |
| `[classify]` | `graph.py` | Query type decision + raw LLM output |
| `[reformulate]` | `graph.py` | Reformulated query string |
| `[hybrid]` | `graph.py` | BM25 / kNN hit counts and RRF result count |
| `[retrieve]` | `graph.py` | Number of docs retrieved and context length |
| `[generate]` | `graph.py` | Response character count |
| `[AskAgentStream]` | `server.py` | Token count and total chars per stream |
| `[eval]` | `evaluate.py` | Per-question retrieval and generation status |
---
## 12. Security Boundaries
| Boundary | Current state | Risk |
|---|---|---|
| gRPC transport | **Insecure** (`add_insecure_port`) | Network interception possible. Acceptable in dev/tunnel setup; requires mTLS for production. |
| Elasticsearch auth | Optional (user/pass or API key via env vars) | Index is accessible without auth if `ELASTICSEARCH_USER` and `ELASTICSEARCH_API_KEY` are unset. |
| Container user | Non-root (`python:3.11-slim` default) | Low risk. Do not override with `root`. |
| Secrets in env | Via `.env` / `docker-compose` env injection | Never commit real values. See [CONTRIBUTING.md](../CONTRIBUTING.md#6-environment-variables-policy). |
| Session store | In-memory, no auth | Any caller with access to the gRPC port can read/write any session by guessing its ID. |
| Kubeconfig | `./kubernetes/kubeconfig.yaml` (local only) | Grants cluster access. Never commit. Listed in `.gitignore`. |
---
## 13. Known Limitations & Future Work
| Area | Limitation | Proposed solution |
|---|---|---|
| Session persistence | In-memory, lost on restart | Redis-backed `session_store` |
| Horizontal scaling | `session_store` is per-process | Sticky sessions or external session store |
| gRPC security | Insecure port | Add TLS + optional mTLS |
| Elasticsearch auth | Not enforced if vars unset | Make auth required; fail-fast on startup |
| Context window | Full history passed to generate; no truncation | Sliding window or summarization for long sessions |
| Evaluation | Golden dataset must be manually maintained | Automated golden dataset refresh pipeline |
| Rate limiting | None on gRPC server | Add interceptor-based rate limiter |
| Health check | No gRPC health protocol | Implement `grpc.health.v1` |

372
docs/AVAP_CHUNKER_CONFIG.md Normal file
View File

@ -0,0 +1,372 @@
# AVAP Chunker — Language Configuration Reference
> **File:** `scripts/pipelines/ingestion/avap_config.json`
> **Used by:** `avap_chunker.py` (Pipeline B)
> **Last updated:** 2026-03-18
This file is the **grammar definition** for the AVAP language chunker. It tells `avap_chunker.py` how to tokenize, parse, and semantically classify `.avap` source files before they are embedded and ingested into Elasticsearch. Modifying this file changes what the chunker recognises as a block, a statement, or a semantic feature — and therefore what metadata every chunk in the knowledge base carries.
---
## Table of Contents
1. [Top-Level Fields](#1-top-level-fields)
2. [Lexer](#2-lexer)
3. [Blocks](#3-blocks)
4. [Statements](#4-statements)
5. [Semantic Tags](#5-semantic-tags)
6. [How They Work Together](#6-how-they-work-together)
7. [Adding New Constructs](#7-adding-new-constructs)
8. [Full Annotated Example](#8-full-annotated-example)
---
## 1. Top-Level Fields
```json
{
"language": "avap",
"version": "1.0",
"file_extensions": [".avap"]
}
```
| Field | Type | Description |
|---|---|---|
| `language` | string | Human-readable language name. Used in chunker progress reports. |
| `version` | string | Config schema version. Increment when making breaking changes. |
| `file_extensions` | array of strings | File extensions the chunker will process. `.md` files are always processed regardless of this setting. |
---
## 2. Lexer
The lexer section controls how raw source lines are stripped of comments and string literals before pattern matching is applied.
```json
"lexer": {
"string_delimiters": ["\"", "'"],
"escape_char": "\\",
"comment_line": ["///", "//"],
"comment_block": { "open": "/*", "close": "*/" },
"line_oriented": true
}
```
| Field | Type | Description |
|---|---|---|
| `string_delimiters` | array of strings | Characters that open and close string literals. Content inside strings is ignored during pattern matching. |
| `escape_char` | string | Character used to escape the next character inside a string. Prevents `\"` from closing the string. |
| `comment_line` | array of strings | Line comment prefixes, evaluated longest-first. Everything after the matched prefix is stripped. AVAP supports both `///` (documentation comments) and `//` (inline comments). |
| `comment_block.open` | string | Block comment opening delimiter. |
| `comment_block.close` | string | Block comment closing delimiter. Content between `/*` and `*/` is stripped before pattern matching. |
| `line_oriented` | bool | When `true`, the lexer processes one line at a time. Should always be `true` for AVAP. |
**Important:** Comment stripping and string boundary detection happen before any block or statement pattern is evaluated. A keyword inside a string literal or a comment will never trigger a block or statement match.
---
## 3. Blocks
Blocks are **multi-line constructs** with a defined opener and closer. The chunker tracks nesting depth — each opener increments depth, each closer decrements it, and the block ends when depth returns to zero. This correctly handles nested `if()` inside `function{}` and similar cases.
Each block definition produces a chunk with `doc_type` as specified and `block_type` equal to the block `name`.
```json
"blocks": [
{
"name": "function",
"doc_type": "code",
"opener_pattern": "^\\s*function\\s+(\\w+)\\s*\\(([^)]*)",
"closer_pattern": "^\\s*\\}\\s*$",
"extract_signature": true,
"signature_template": "function {group1}({group2})"
},
...
]
```
### Block fields
| Field | Type | Required | Description |
|---|---|---|---|
| `name` | string | Yes | Identifier for this block type. Used as `block_type` in the chunk metadata and in the `semantic_overlap` context header. |
| `doc_type` | string | Yes | Elasticsearch `doc_type` field value for chunks from this block. |
| `opener_pattern` | regex string | Yes | Pattern matched against the clean (comment-stripped) line to detect the start of this block. Must be anchored at the start (`^`). |
| `closer_pattern` | regex string | Yes | Pattern matched to detect the end of this block. Checked at every line after the opener. |
| `extract_signature` | bool | No (default: `false`) | When `true`, the chunker extracts a compact signature string from the opener line using capture groups, and creates an additional `function_signature` chunk alongside the full block chunk. |
| `signature_template` | string | No | Template for the signature string. Uses `{group1}`, `{group2}`, etc. as placeholders for the regex capture groups from `opener_pattern`. |
### Current block definitions
#### `function`
```
opener: ^\\s*function\\s+(\\w+)\\s*\\(([^)]*)
closer: ^\\s*\\}\\s*$
```
Matches any top-level or nested AVAP function declaration. The two capture groups extract the function name (`group1`) and parameter list (`group2`), which are combined into the signature template `function {group1}({group2})`.
Because `extract_signature: true`, every function produces **two chunks**:
1. A `doc_type: "code"`, `block_type: "function"` chunk containing the full function body.
2. A `doc_type: "function_signature"`, `block_type: "function_signature"` chunk containing only the signature string (e.g. `function validateAccess(userId, token)`). This lightweight chunk is indexed separately to enable fast function-name lookup without retrieving the entire body.
Additionally, the function signature is registered in the `SemanticOverlapBuffer`. Subsequent non-function chunks in the same file will receive the current function signature prepended as a context comment (`// contexto: function validateAccess(userId, token)`), keeping the surrounding code semantically grounded.
#### `if`
```
opener: ^\\s*if\\s*\\(
closer: ^\\s*end\\s*\\(\\s*\\)
```
Matches AVAP conditional blocks. Note: AVAP uses `end()` as the closer, not `}`.
#### `startLoop`
```
opener: ^\\s*startLoop\\s*\\(
closer: ^\\s*endLoop\\s*\\(\\s*\\)
```
Matches AVAP iteration blocks. The closer is `endLoop()`.
#### `try`
```
opener: ^\\s*try\\s*\\(\\s*\\)
closer: ^\\s*end\\s*\\(\\s*\\)
```
Matches AVAP error-handling blocks (`try()` … `end()`).
---
## 4. Statements
Statements are **single-line constructs**. Lines that are not part of any block opener or closer are classified against the statement patterns in order. The first match wins. If no pattern matches, the statement is classified as `"statement"` (the fallback).
Consecutive lines with the same statement type are **grouped into a single chunk**, keeping semantically related statements together. When the statement type changes, the current group is flushed as a chunk.
```json
"statements": [
{ "name": "registerEndpoint", "pattern": "^\\s*registerEndpoint\\s*\\(" },
{ "name": "addVar", "pattern": "^\\s*addVar\\s*\\(" },
...
]
```
### Statement fields
| Field | Type | Description |
|---|---|---|
| `name` | string | Used as `block_type` in the chunk metadata. |
| `pattern` | regex string | Matched against the clean line. First match wins — order matters. |
### Current statement definitions
| Name | Matches | AVAP commands |
|---|---|---|
| `registerEndpoint` | API route registration | `registerEndpoint(...)` |
| `addVar` | Variable declaration | `addVar(...)` |
| `io_command` | Input/output operations | `addParam`, `getListLen`, `addResult`, `getQueryParamList` |
| `http_command` | HTTP client calls | `RequestPost`, `RequestGet` |
| `orm_command` | Database ORM operations | `ormDirect`, `ormCheckTable`, `ormCreateTable`, `ormAccessSelect`, `ormAccessInsert`, `ormAccessUpdate` |
| `util_command` | Utility and helper functions | `variableToList`, `itemFromList`, `variableFromJSON`, `AddVariableToJSON`, `encodeSHA256`, `encodeMD5`, `getRegex`, `getDateTime`, `stampToDatetime`, `getTimeStamp`, `randomString`, `replace` |
| `async_command` | Concurrency primitives | `x = go funcName(`, `gather(` |
| `connector` | External service connector | `x = avapConnector(` |
| `modularity` | Module imports | `import`, `include` |
| `assignment` | Variable assignment (catch-all before fallback) | `x = ...` |
**Ordering note:** `registerEndpoint`, `addVar`, and the specific command categories are listed before `assignment` intentionally. `assignment` would match many of them (they all contain `=` or are function calls that could follow an assignment), so the more specific patterns must come first.
---
## 5. Semantic Tags
Semantic tags are **boolean metadata flags** applied to every chunk (both blocks and statements) by scanning the entire chunk content with a regex. A chunk can have multiple tags simultaneously.
The `complexity` field is automatically computed as the count of `true` tags in a chunk's metadata, providing a rough signal of how much AVAP functionality a given chunk exercises.
```json
"semantic_tags": [
{ "tag": "uses_orm", "pattern": "\\b(ormDirect|ormAccessSelect|...)\\s*\\(" },
...
]
```
### Tag fields
| Field | Description |
|---|---|
| `tag` | Key name in the `metadata` object stored in Elasticsearch. Value is always `true` when present. |
| `pattern` | Regex searched (not matched) across the full chunk text. Uses `\b` word boundaries to avoid false positives. |
### Current semantic tags
| Tag | Detected when chunk contains |
|---|---|
| `uses_orm` | Any ORM command: `ormDirect`, `ormCheckTable`, `ormCreateTable`, `ormAccessSelect`, `ormAccessInsert`, `ormAccessUpdate` |
| `uses_http` | HTTP client calls: `RequestPost`, `RequestGet` |
| `uses_connector` | External connector: `avapConnector(` |
| `uses_async` | Concurrency: `go funcName(` or `gather(` |
| `uses_crypto` | Hashing/encoding: `encodeSHA256(`, `encodeMD5(` |
| `uses_auth` | Auth-related commands: `addParam`, `_status` |
| `uses_error_handling` | Error handling block: `try()` |
| `uses_loop` | Loop construct: `startLoop(` |
| `uses_json` | JSON operations: `variableFromJSON(`, `AddVariableToJSON(` |
| `uses_list` | List operations: `variableToList(`, `itemFromList(`, `getListLen(` |
| `uses_regex` | Regular expressions: `getRegex(` |
| `uses_datetime` | Date/time operations: `getDateTime(`, `getTimeStamp(`, `stampToDatetime(` |
| `returns_result` | Returns data to the API caller: `addResult(` |
| `registers_endpoint` | Defines an API route: `registerEndpoint(` |
**How tags are used at retrieval time:** The Elasticsearch mapping stores each tag as a `boolean` field under the `metadata` object. This enables filtered retrieval — for example, a future retrieval enhancement could boost chunks with `metadata.uses_orm: true` for queries that contain ORM-related keywords, improving precision for database-related questions.
---
## 6. How They Work Together
The following example shows how `avap_chunker.py` processes a real `.avap` file using this config:
```avap
// Validate user session
function validateAccess(userId, token) {
addVar(isValid = false)
addParam(userId)
try()
ormAccessSelect(users, id = userId)
addVar(isValid = true)
end()
addResult(isValid)
}
registerEndpoint(POST, /validate)
```
**Chunks produced:**
| # | `doc_type` | `block_type` | Content | Tags |
|---|---|---|---|---|
| 1 | `code` | `function` | Full function body (lines 210) | `uses_auth`, `uses_orm`, `uses_error_handling`, `returns_result` · `complexity: 4` |
| 2 | `function_signature` | `function_signature` | `function validateAccess(userId, token)` | — |
| 3 | `code` | `registerEndpoint` | `registerEndpoint(POST, /validate)` | `registers_endpoint` · `complexity: 1` |
Chunk 1 also receives the function signature as a semantic overlap header because the `SemanticOverlapBuffer` tracks `validateAccess` and injects it as context into any subsequent non-function chunks in the same file.
---
## 7. Adding New Constructs
### Adding a new block type
1. Identify the opener and closer patterns from the AVAP LRM (`docs/LRM/avap.md`).
2. Add an entry to `"blocks"` in `avap_config.json`.
3. If the block introduces a named construct worth indexing independently (like functions), set `"extract_signature": true` and define a `"signature_template"`.
4. Run a smoke test on a representative `.avap` file:
```bash
python scripts/pipelines/ingestion/avap_chunker.py \
--lang-config scripts/pipelines/ingestion/avap_config.json \
--docs-path docs/samples \
--output /tmp/test_chunks.jsonl \
--no-dedup
```
5. Inspect `/tmp/test_chunks.jsonl` and verify the new `block_type` appears with the expected content.
6. Re-run the ingestion pipeline to rebuild the index.
### Adding a new statement category
1. Add an entry to `"statements"` **before** the `assignment` catch-all.
2. Use `^\\s*` to anchor the pattern at the start of the line.
3. Test as above — verify the new `block_type` appears in the JSONL output.
### Adding a new semantic tag
1. Add an entry to `"semantic_tags"`.
2. Use `\\b` word boundaries to prevent false positives on substrings.
3. Add the new tag as a `boolean` field to the Elasticsearch index mapping in `avap_ingestor.py` (`build_index_mapping()`).
4. **Re-index from scratch** — existing documents will not have the new tag unless the index is rebuilt (`--delete` flag).
---
## 8. Full Annotated Example
```jsonc
{
// Identifies this config as the AVAP v1.0 grammar
"language": "avap",
"version": "1.0",
"file_extensions": [".avap"], // Only .avap files; .md is always included
"lexer": {
"string_delimiters": ["\"", "'"], // Both quote styles used in AVAP
"escape_char": "\\",
"comment_line": ["///", "//"], // /// first — longest match wins
"comment_block": { "open": "/*", "close": "*/" },
"line_oriented": true
},
"blocks": [
{
"name": "function",
"doc_type": "code",
// Captures: group1=name, group2=params
"opener_pattern": "^\\s*function\\s+(\\w+)\\s*\\(([^)]*)",
"closer_pattern": "^\\s*\\}\\s*$", // AVAP functions close with }
"extract_signature": true,
"signature_template": "function {group1}({group2})"
},
{
"name": "if",
"doc_type": "code",
"opener_pattern": "^\\s*if\\s*\\(",
"closer_pattern": "^\\s*end\\s*\\(\\s*\\)" // AVAP if closes with end()
},
{
"name": "startLoop",
"doc_type": "code",
"opener_pattern": "^\\s*startLoop\\s*\\(",
"closer_pattern": "^\\s*endLoop\\s*\\(\\s*\\)"
},
{
"name": "try",
"doc_type": "code",
"opener_pattern": "^\\s*try\\s*\\(\\s*\\)",
"closer_pattern": "^\\s*end\\s*\\(\\s*\\)" // try also closes with end()
}
],
"statements": [
// Specific patterns first — must come before the generic "assignment" catch-all
{ "name": "registerEndpoint", "pattern": "^\\s*registerEndpoint\\s*\\(" },
{ "name": "addVar", "pattern": "^\\s*addVar\\s*\\(" },
{ "name": "io_command", "pattern": "^\\s*(addParam|getListLen|addResult|getQueryParamList)\\s*\\(" },
{ "name": "http_command", "pattern": "^\\s*(RequestPost|RequestGet)\\s*\\(" },
{ "name": "orm_command", "pattern": "^\\s*(ormDirect|ormCheckTable|ormCreateTable|ormAccessSelect|ormAccessInsert|ormAccessUpdate)\\s*\\(" },
{ "name": "util_command", "pattern": "^\\s*(variableToList|itemFromList|variableFromJSON|AddVariableToJSON|encodeSHA256|encodeMD5|getRegex|getDateTime|stampToDatetime|getTimeStamp|randomString|replace)\\s*\\(" },
{ "name": "async_command", "pattern": "^\\s*(\\w+\\s*=\\s*go\\s+|gather\\s*\\()" },
{ "name": "connector", "pattern": "^\\s*\\w+\\s*=\\s*avapConnector\\s*\\(" },
{ "name": "modularity", "pattern": "^\\s*(import|include)\\s+" },
{ "name": "assignment", "pattern": "^\\s*\\w+\\s*=\\s*" } // catch-all
],
"semantic_tags": [
// Applied to every chunk by full-content regex search (not line-by-line)
{ "tag": "uses_orm", "pattern": "\\b(ormDirect|ormCheckTable|ormCreateTable|ormAccessSelect|ormAccessInsert|ormAccessUpdate)\\s*\\(" },
{ "tag": "uses_http", "pattern": "\\b(RequestPost|RequestGet)\\s*\\(" },
{ "tag": "uses_connector", "pattern": "\\bavapConnector\\s*\\(" },
{ "tag": "uses_async", "pattern": "\\bgo\\s+\\w+\\s*\\(|\\bgather\\s*\\(" },
{ "tag": "uses_crypto", "pattern": "\\b(encodeSHA256|encodeMD5)\\s*\\(" },
{ "tag": "uses_auth", "pattern": "\\b(addParam|_status)\\b" },
{ "tag": "uses_error_handling", "pattern": "\\btry\\s*\\(\\s*\\)" },
{ "tag": "uses_loop", "pattern": "\\bstartLoop\\s*\\(" },
{ "tag": "uses_json", "pattern": "\\b(variableFromJSON|AddVariableToJSON)\\s*\\(" },
{ "tag": "uses_list", "pattern": "\\b(variableToList|itemFromList|getListLen)\\s*\\(" },
{ "tag": "uses_regex", "pattern": "\\bgetRegex\\s*\\(" },
{ "tag": "uses_datetime", "pattern": "\\b(getDateTime|getTimeStamp|stampToDatetime)\\s*\\(" },
{ "tag": "returns_result", "pattern": "\\baddResult\\s*\\(" },
{ "tag": "registers_endpoint", "pattern": "\\bregisterEndpoint\\s*\\(" }
]
}
```

View File

@ -1,6 +1,6 @@
### Prefacio Arquitectónico
**AVAP es un DSL (Domain-Specific Language) Turing Completo, diseñado arquitectónicamente para la orquestación segura, concurrente y determinista de microservicios e I/O.** No es un lenguaje de propósito general; su motor híbrido y su gramática estricta están optimizados para el procesamiento rápido de transacciones HTTP, la manipulación de datos en memoria y la persistencia, minimizando los efectos secundarios no deseados.
**AVAP (Advanced Virtual API Programming) es un DSL (Domain-Specific Language) Turing Completo, diseñado arquitectónicamente para la orquestación segura, concurrente y determinista de microservicios e I/O.** No es un lenguaje de propósito general; su motor híbrido y su gramática estricta están optimizados para el procesamiento rápido de transacciones HTTP, la manipulación de datos en memoria y la persistencia, minimizando los efectos secundarios no deseados.
---
@ -388,7 +388,7 @@ AVAP provee tres comandos complementarios para cubrir todas las conversiones pos
/* Expresiones regulares */
<regex_cmd> ::= "getRegex(" <identifier> "," <expression> "," <identifier> ")"
/* Fecha/hora actual string */
/* Fecha/hora actual -> string */
<datetime_cmd> ::= "getDateTime(" <stringliteral> "," <expression> "," <stringliteral> "," <identifier> ")"
/* Argumentos: formato_salida, timedelta, zona_horaria, destino */

389
docs/RUNBOOK.md Normal file
View File

@ -0,0 +1,389 @@
# Brunix Assistance Engine — Operations Runbook
> **Audience:** Engineers on-call, DevOps, and anyone debugging the Brunix Engine in a live environment.
> **Last updated:** 2026-03-18
---
## Table of Contents
1. [Health Checks](#1-health-checks)
2. [Starting the Engine](#2-starting-the-engine)
3. [Stopping & Restarting](#3-stopping--restarting)
4. [Tunnel Management](#4-tunnel-management)
5. [Incident Playbooks](#5-incident-playbooks)
- [Engine fails to start](#51-engine-fails-to-start)
- [Elasticsearch unreachable](#52-elasticsearch-unreachable)
- [Ollama unreachable / model not found](#53-ollama-unreachable--model-not-found)
- [AskAgent returns `[ENG] Error`](#54-askagent-returns-eng-error)
- [EvaluateRAG returns ANTHROPIC_API_KEY error](#55-evaluaterag-returns-anthropic_api_key-error)
- [Container memory / OOM](#56-container-memory--oom)
- [Session history not persisting between requests](#57-session-history-not-persisting-between-requests)
6. [Log Reference](#6-log-reference)
7. [Useful Commands](#7-useful-commands)
8. [Escalation Path](#8-escalation-path)
---
## 1. Health Checks
### Is the gRPC server up?
```bash
grpcurl -plaintext localhost:50052 list
# Expected: brunix.AssistanceEngine
```
If `grpcurl` hangs or returns a connection error, the container is not running or the port is not mapped.
### Is Elasticsearch reachable?
```bash
curl -s http://localhost:9200/_cluster/health | python3 -m json.tool
# Expected: "status": "green" or "yellow"
```
### Is Ollama reachable?
```bash
curl -s http://localhost:11434/api/tags | python3 -m json.tool
# Expected: list of available models including qwen2.5:1.5b
```
### Is the embedding model loaded?
```bash
curl -s http://localhost:11434/api/tags | grep qwen3-0.6B-emb
# Expected: model entry present
```
### Is Langfuse reachable?
```bash
curl -s http://45.77.119.180/api/public/health
# Expected: {"status":"ok"}
```
---
## 2. Starting the Engine
### Prerequisites checklist
- [ ] Kubeconfig present at `./kubernetes/kubeconfig.yaml`
- [ ] `.env` file populated with all required variables (see `README.md`)
- [ ] All three kubectl tunnels active (see [§4](#4-tunnel-management))
- [ ] Docker daemon running
### Start command
```bash
cd Docker/
docker-compose up -d --build
```
### Verify startup
```bash
# Watch logs until you see "Brunix Engine initialized."
docker logs -f brunix-assistance-engine
# Expected log sequence:
# [ESEARCH] Connected: 8.x.x — index: avap-docs-test
# [ENGINE] listen on 50051 (gRPC)
# Brunix Engine initialized.
# [entrypoint] Starting OpenAI Proxy (HTTP :8000)...
```
**Startup typically takes 2060 seconds** depending on Ollama model loading time.
---
## 3. Stopping & Restarting
```bash
# Graceful stop
docker-compose down
# Hard stop (if container is unresponsive)
docker stop brunix-assistance-engine
docker rm brunix-assistance-engine
# Restart only the engine (no rebuild)
docker-compose restart brunix-engine
# Rebuild and restart (after code changes)
docker-compose up -d --build
```
> ⚠️ **Restart clears all in-memory session history.** All active conversations will lose context.
---
## 4. Tunnel Management
All three tunnels must be active for the engine to function. Run each in a separate terminal or as a background process.
```bash
# Tunnel 1 — Ollama (LLM + embeddings)
kubectl port-forward --address 0.0.0.0 svc/ollama-light-service 11434:11434 \
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
# Tunnel 2 — Elasticsearch (vector knowledge base)
kubectl port-forward --address 0.0.0.0 svc/brunix-vector-db 9200:9200 \
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
# Tunnel 3 — PostgreSQL (Langfuse observability)
kubectl port-forward --address 0.0.0.0 svc/brunix-postgres 5432:5432 \
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
```
### Check tunnel status
```bash
# List active port-forwards
ps aux | grep "kubectl port-forward"
# Alternatively
lsof -i :11434
lsof -i :9200
lsof -i :5432
```
### Tunnel dropped?
kubectl tunnels drop silently. Symptoms:
- Elasticsearch: `[ESEARCH] Cant Connect` in engine logs
- Ollama: requests timeout or return connection errors
- Langfuse: tracing data stops appearing in the dashboard
**Fix:** Re-run the affected tunnel command. The engine will reconnect automatically on the next request.
---
## 5. Incident Playbooks
### 5.1 Engine fails to start
**Symptom:** `docker-compose up` exits immediately, or container restarts in a loop.
**Diagnosis:**
```bash
docker logs brunix-assistance-engine 2>&1 | head -50
```
**Common causes and fixes:**
| Log message | Cause | Fix |
|---|---|---|
| `Cannot connect to Ollama` | Ollama tunnel not running | Start Tunnel 1 |
| `model 'qwen2.5:1.5b' not found` | Model not loaded in Ollama | See [§5.3](#53-ollama-unreachable--model-not-found) |
| `ELASTICSEARCH_URL not set` | Missing `.env` | Check `.env` file exists and is complete |
| `No module named 'brunix_pb2'` | Proto stubs not generated | Run `docker-compose up --build` |
| `Port 50051 already in use` | Another instance running | `docker stop brunix-assistance-engine && docker rm brunix-assistance-engine` |
---
### 5.2 Elasticsearch unreachable
**Symptom:** Log shows `[ESEARCH] Cant Connect`. Queries return empty context.
**Step 1 — Verify tunnel:**
```bash
curl -s http://localhost:9200/_cluster/health
```
**Step 2 — Restart tunnel if down:**
```bash
kubectl port-forward --address 0.0.0.0 svc/brunix-vector-db 9200:9200 \
-n brunix --kubeconfig ./kubernetes/kubeconfig.yaml
```
**Step 3 — Check index exists:**
```bash
curl -s http://localhost:9200/_cat/indices?v | grep avap
```
If the index is missing, the knowledge base has not been ingested. Run:
```bash
cd scripts/pipelines/flows/
python elasticsearch_ingestion.py
```
**Step 4 — Verify authentication:**
If your cluster uses authentication, confirm `ELASTICSEARCH_USER` + `ELASTICSEARCH_PASSWORD` or `ELASTICSEARCH_API_KEY` are set in `.env`.
---
### 5.3 Ollama unreachable / model not found
**Symptom:** Engine logs show connection errors to `http://host.docker.internal:11434`, or `validate_model_on_init=True` raises a model-not-found error on startup.
**Step 1 — Verify Ollama tunnel is active:**
```bash
curl -s http://localhost:11434/api/tags
```
**Step 2 — List available models:**
```bash
curl -s http://localhost:11434/api/tags | python3 -c "
import json, sys
data = json.load(sys.stdin)
for m in data.get('models', []):
print(m['name'])
"
```
**Step 3 — Pull missing models if needed:**
```bash
# On the Devaron cluster (via kubectl exec or direct access):
ollama pull qwen2.5:1.5b
ollama pull qwen3-0.6B-emb:latest
```
**Step 4 — Restart engine** after models are available:
```bash
docker-compose restart brunix-engine
```
---
### 5.4 AskAgent returns `[ENG] Error`
**Symptom:** Client receives `{"text": "[ENG] Error: ...", "is_final": true}`.
**Diagnosis:**
```bash
docker logs brunix-assistance-engine 2>&1 | grep -A 10 "Error"
```
| Error substring | Cause | Fix |
|---|---|---|
| `Connection refused` to `11434` | Ollama tunnel down | Restart Tunnel 1 |
| `Connection refused` to `9200` | ES tunnel down | Restart Tunnel 2 |
| `Index not found` | ES index missing | Run ingestion pipeline |
| `context length exceeded` | Query + history too long for model | Reduce session history or use a larger context model |
| `Traceback` / `KeyError` | Code bug | Check full traceback, open GitHub Issue |
---
### 5.5 EvaluateRAG returns ANTHROPIC_API_KEY error
**Symptom:** `EvalResponse.status` = `"ANTHROPIC_API_KEY no configurada en .env"`.
**Fix:**
1. Add `ANTHROPIC_API_KEY=sk-ant-...` to your `.env` file.
2. Add `ANTHROPIC_MODEL=claude-sonnet-4-20250514` (optional, has default).
3. Restart the engine: `docker-compose restart brunix-engine`.
---
### 5.6 Container memory / OOM
**Symptom:** Container is killed by the OOM killer. `docker inspect brunix-assistance-engine` shows `OOMKilled: true`.
**Diagnosis:**
```bash
docker stats brunix-assistance-engine
```
**Common causes:**
- Large context window being passed to Ollama (many retrieved chunks × long document).
- Session history growing unbounded over a long-running session.
**Mitigation:**
- Set `mem_limit` in `docker-compose.yaml`:
```yaml
services:
brunix-engine:
mem_limit: 4g
```
- Restart the container to clear session store.
- Consider reducing `k=8` in `hybrid_search_native` to limit context size.
---
### 5.7 Session history not persisting between requests
**Expected behaviour:** Sending two requests with the same `session_id` should maintain context.
**If Turn 2 does not seem to know about Turn 1:**
1. Confirm both requests use **identical** `session_id` strings (case-sensitive, no trailing spaces).
2. Confirm the engine was **not restarted** between the two requests (restart wipes `session_store`).
3. Check logs for `[AskAgentStream] conversation: N previous messages.` — if `N=0` on Turn 2, the session was not found.
4. Confirm the stream for Turn 1 was **fully consumed** (client read all messages including `is_final=true`) — the engine only persists history after the stream ends.
---
## 6. Log Reference
| Log prefix | Module | What it means |
|---|---|---|
| `[ESEARCH] Connected` | `server.py` | Elasticsearch OK on startup |
| `[ESEARCH] Cant Connect` | `server.py` | Elasticsearch unreachable on startup |
| `[ENGINE] listen on 50051` | `server.py` | gRPC server ready |
| `[AskAgent] session=... query=...` | `server.py` | New non-streaming request |
| `[AskAgent] conversation: N messages` | `server.py` | History loaded for session |
| `[AskAgentStream] done — chunks=N` | `server.py` | Stream completed, history saved |
| `[classify] raw=... -> TYPE` | `graph.py` | Query classification result |
| `[reformulate] -> '...'` | `graph.py` | Reformulated query |
| `[hybrid] BM25 -> N hits` | `graph.py` | BM25 retrieval result |
| `[hybrid] kNN -> N hits` | `graph.py` | kNN retrieval result |
| `[hybrid] RRF -> N final docs` | `graph.py` | After RRF fusion |
| `[retrieve] N docs, context len=X` | `graph.py` | Context assembled |
| `[generate] X chars` | `graph.py` | Non-streaming answer generated |
| `[eval] Iniciando: N preguntas` | `evaluate.py` | Evaluation started |
| `[eval] Completado — global=X` | `evaluate.py` | Evaluation finished |
---
## 7. Useful Commands
```bash
# Real-time log streaming
docker logs -f brunix-assistance-engine
# Filter for errors only
docker logs brunix-assistance-engine 2>&1 | grep -i error
# Check container resource usage
docker stats brunix-assistance-engine --no-stream
# Enter container for debugging
docker exec -it brunix-assistance-engine /bin/bash
# Send a test query
grpcurl -plaintext \
-d '{"query": "What is AVAP?", "session_id": "test"}' \
localhost:50052 brunix.AssistanceEngine/AskAgent
# Check ES index document count
curl -s "http://localhost:9200/avap-docs-test/_count" | python3 -m json.tool
# Check ES index mapping
curl -s "http://localhost:9200/avap-docs-test/_mapping" | python3 -m json.tool
# List active containers
docker ps --filter name=brunix
# Check port bindings
docker port brunix-assistance-engine
```
---
## 8. Escalation Path
| Severity | Condition | Action |
|---|---|---|
| P1 | Engine completely down, not recoverable in 15 min | Notify via Slack `#brunix-incidents` immediately. Tag CTO. |
| P2 | Degraded quality (bad answers) or evaluation score drops below 0.60 | Open GitHub Issue with full log output and evaluation report. |
| P3 | Tunnel instability, intermittent errors | Report in daily standup. Document in GitHub Issue within 24h. |
| P4 | Documentation gap or non-critical config issue | Open GitHub Issue with label `documentation` or `improvement`. |
**For all P1/P2 incidents, the GitHub Issue must include:**
1. Exact command that triggered the failure
2. Full terminal output / error log
3. Status of all three kubectl tunnels at the time of failure
4. Docker container status (`docker inspect brunix-assistance-engine`)

102
docs/SECURITY.md Normal file
View File

@ -0,0 +1,102 @@
# Security Policy
## Supported Versions
| Version | Security patches |
|---|---|
| 1.5.x | ✅ Active |
| 1.4.x | ⚠️ Critical fixes only |
| < 1.4 | Not supported |
---
## Reporting a Vulnerability
**Do not open a public GitHub Issue for security vulnerabilities.**
Report security issues directly to the CTO via the private Slack channel `#brunix-security` or by email to the address on file. Include:
1. A clear description of the vulnerability and its potential impact.
2. Steps to reproduce (proof-of-concept if applicable).
3. Affected component(s) and version(s).
4. Suggested remediation, if known.
You will receive an acknowledgement within **48 hours** and a resolution timeline within **7 business days** for confirmed issues.
---
## Security Model
### Transport
The gRPC server currently runs with `add_insecure_port`**there is no TLS in the current dev configuration.** This is intentional for the local development setup where all traffic flows through authenticated kubectl tunnels.
**For any production or internet-exposed deployment, TLS must be enabled.** See ADR-0003 for context.
### Authentication & Authorization
The current version has **no authentication layer** on the gRPC API. Any client with network access to port `50052` can call any RPC method and access any session by session ID.
Acceptable risk boundaries for the current deployment:
- Port `50052` must be accessible **only** to authorized developers via firewall rules or VPN.
- Do not expose port `50052` on a public IP without an authenticating reverse proxy.
### Secrets Management
All secrets (API keys, database credentials) are managed exclusively via environment variables. The following rules are enforced:
- **Never commit real secret values** to any branch, including feature branches.
- Use placeholder values (e.g., `sk-ant-...`, `pk-lf-...`) in documentation and examples.
- The `.env` file is listed in `.gitignore` and must never be committed.
- The `kubernetes/kubeconfig.yaml` file grants cluster-level access and must never be committed.
- PRs containing secrets or committed `.env` / kubeconfig files will be **immediately closed** and the committer will be required to rotate all exposed credentials before resubmission.
**Environment variables that contain secrets:**
| Variable | Type |
|---|---|
| `LANGFUSE_PUBLIC_KEY` | API key |
| `LANGFUSE_SECRET_KEY` | API key |
| `ANTHROPIC_API_KEY` | API key |
| `ELASTICSEARCH_PASSWORD` | Credential |
| `ELASTICSEARCH_API_KEY` | API key |
| `HF_TOKEN` | API key |
### Container Security
- The container runs as a **non-root user** (Python 3.11 slim base image default).
- Using `root` as the container user is explicitly prohibited (see `CONTRIBUTING.md` §3).
- The `/workspace` directory is deprecated. All application code runs from `/app`.
- The `.dockerignore` ensures that development artifacts (`.git`, `.env`, `tests/`, `docs/`) are excluded from the production image.
### Data Privacy
- All LLM inference (text generation and embeddings) is performed within the **private Devaron Kubernetes cluster** on Vultr infrastructure. No user query data is sent to external third-party APIs during normal operation.
- The exception is the `EvaluateRAG` endpoint, which sends **golden dataset questions and generated answers** to the Anthropic API (Claude) for evaluation scoring. No real user queries from production sessions are used in evaluation.
- Conversation history is stored **in-memory only** and is never persisted to disk or an external database.
### Dependency Security
- Dependencies are pinned via `uv.lock` and exported to `Docker/requirements.txt`.
- Dependency updates should be reviewed for security advisories before merging.
- Run `pip audit` or `safety check` against `Docker/requirements.txt` before major releases.
```bash
pip install pip-audit
pip-audit -r Docker/requirements.txt
```
---
## Known Security Limitations
These are acknowledged risks accepted for the current development phase. They must be addressed before any production internet-facing deployment.
| ID | Limitation | Risk | Mitigation required |
|---|---|---|---|
| SEC-001 | No gRPC TLS | Traffic interception | Enable TLS with server certificate |
| SEC-002 | No API authentication | Unauthorized access | Add JWT / mutual TLS authentication |
| SEC-003 | Session IDs are guessable | Session hijacking | Enforce UUIDs; validate ownership |
| SEC-004 | No rate limiting | DoS / cost amplification | Add gRPC interceptor rate limiter |
| SEC-005 | In-memory session store | Data loss on restart | Acceptable for dev; requires Redis for prod |
| SEC-006 | `ELASTICSEARCH_USER/PASS` optional | Unauthenticated ES access | Make auth required in prod; fail-fast if absent |

View File

@ -1,134 +0,0 @@
6. Expressions in AVAP
This chapter explains the meaning of expression elements in AVAP.
6.1. Arithmetic Conversions
When describing an arithmetic operator in AVAP and using the phrase "numeric arguments are converted to a common type," it means that the operator's implementation for built-in types works as follows:
If either of the arguments is a complex number, the other is converted to complex.
Otherwise, if either of the arguments is a floating-point number, the other is converted to floating-point.
Otherwise, both must be integers, and no conversion is needed.
Additional rules may apply for certain operators.
6.2. Atoms
Atoms are the most basic elements of expressions in AVAP. The simplest atoms are identifiers or literals. Forms enclosed in parentheses, brackets, or braces are also syntactically categorized as atoms. The syntax for atoms is:
atom ::= identifier | literal | enclosure
enclosure ::= parenth_form | list_display | dict_display | set_display | generator_expression
6.2.1. Identifiers (Names)
An identifier that appears as an atom is a name. When the name is bound to an object, evaluating the atom yields that object. When a name is not bound, an attempt to evaluate it raises a NameError exception.
Private Name Mangling
When an identifier that occurs literally in a class definition begins with two or more underscores and does not end with two or more underscores, it is considered a private name of that class. Private names are transformed into a longer form before code is generated for them. The transformation inserts the class name, with the initial underscores removed and a single underscore inserted, in front of the name.
6.2.2. Literals
AVAP supports string and bytes literals, as well as various numeric literals:
literal ::= stringliteral | bytesliteral | integer | floatnumber | imagnumber
Evaluating a literal produces an object of the given type (string, bytes, integer, floating-point number, complex number) with the given value. All literals correspond to immutable data types.
6.2.3. Parenthesized Forms
A parenthesized form is an optional list of expressions enclosed in parentheses:
parenth_form ::= "(" [starred_expression] ")"
A parenthesized expression produces whatever the expression list produces: if the list contains at least one comma, it produces a tuple; otherwise, it produces the single expression that makes up the list of expressions.
6.2.4. Comprehensions for Lists, Sets and Dictionaries
To construct a list, set, or dictionary, AVAP provides special syntax called "comprehension," each in two flavors:
The contents of the container are listed explicitly.
They are computed using a set of loop and filtering instructions, called a "comprehension."
Common syntax elements for comprehensions are:
comprehension ::= assignment_expression comp_for
comp_for ::= "for" target_list "in" or_test [comp_iter]
comp_iter ::= comp_for | comp_if
comp_if ::= "if" or_test [comp_iter]
A comprehension consists of a single expression followed by at least one for clause and zero or more for or if clauses. In this case, the elements of the new container are those produced by considering each for or if clause as a block, nested from left to right, and evaluating the expression to produce an element each time the innermost block is reached.
6.2.5. List Displays
In AVAP, lists are generated and handled differently. To construct a list, the command variableToList(variable, list) is used, and an item from the list is retrieved with itemFromList(list, index, variable_to_store_item). To get the number of elements in the list, getListLen(list, var_to_store_list_length) is used.
The syntax for list displays is:
list_display ::= "[" [starred_list | comprehension] "]"
A list display produces a new list object, whose content is specified by a list of expressions or a comprehension. When a list of expressions is provided, its elements are evaluated from left to right and placed in the list object in that order.
6.2.6. Set Displays
A set display is denoted by curly braces and is distinguished from dictionary displays by the absence of colon characters separating keys and values:
set_display ::= "{" (starred_list | comprehension) "}"
A set display produces a new mutable set object, whose content is specified by a sequence of expressions or a comprehension.
6.2.7. Dictionary Displays
In AVAP, objects are created and managed using specific commands. An object is created with AddvariableToJSON(key, value, object_variable), and a key from the object is retrieved with variableFromJSON(object_variable, key, var_to_store_key_value).
The syntax for dictionary displays is:
dict_display ::= "{" [dict_item_list | dict_comprehension] "}"
dict_item_list ::= dict_item ("," dict_item)* [","]
dict_item ::= expression ":" expression | "**" or_expr
dict_comprehension ::= expression ":" expression comp_for
A dictionary display produces a new dictionary object. If a comma-separated sequence of dictionary items is provided, they are evaluated from left to right to define the dictionary entries.
Slices
A slice selects a range of elements in a sequence object (e.g., a string, tuple, or list). Slices can be used as expressions or as targets in assignments or statements. The syntax for a slice is as follows:
slicing ::= primary "[" slice_list "]"
slice_list ::= slice_item ("," slice_item)* [","]
slice_item ::= expression | proper_slice
proper_slice ::= [lower_bound] ":" [upper_bound] [ ":" [stride] ]
lower_bound ::= expression
upper_bound ::= expression
stride ::= expression
There is ambiguity in the formal syntax here: anything that looks like a list expression also looks like a list slice, so any subscription might be interpreted as a slice. Instead of complicating the syntax further, this is disambiguated by defining that in this case, the interpretation as a subscription takes precedence over the interpretation as a slice (this is the case if the list slice does not contain a proper slice).
The semantics for a slice are as follows. The primary is indexed (using the same __getitem__() method as in a normal subscription) with a key constructed from the slice list, as follows. If the slice list contains at least one comma, the key is a tuple that contains the conversion of the slice elements; otherwise, the conversion of the single slice element is the key. The conversion of a slice element that is an expression is that expression. The conversion of a proper slice is a slice object whose start, stop, and step attributes are the values of the expressions given as the lower bound, upper bound, and step, respectively, substituting None for missing expressions.
Calls
A call invokes a callable object (e.g., a function) with a possibly empty series of arguments:
call ::= primary "(" [argument_list [","] | comprehension] ")"
argument_list ::= positional_arguments ["," starred_and_keywords]
["," keywords_arguments]
| starred_and_keywords ["," keywords_arguments]
| keywords_arguments
positional_arguments ::= positional_item ("," positional_item)*
positional_item ::= assignment_expression | "*" expression
starred_and_keywords ::= ("*" expression | keyword_item)
("," "*" expression | "," keyword_item)*
keywords_arguments ::= (keyword_item | "**" expression)
("," keyword_item | "," "**" expression)*
keyword_item ::= identifier "=" expression
An optional trailing comma may be present after positional and keyword arguments but does not affect the semantics.
The primary must evaluate to a callable object (user-defined functions, built-in functions, built-in object methods, class objects, class instance methods, and any object with a __call__() method are callable). All argument expressions are evaluated before attempting the call. Please refer to the Function Definitions section for the syntax of formal parameter lists.
If keyword arguments are present, they are first converted into positional arguments as follows. First, a list of unfilled slots is created for the formal parameters. If there are N positional arguments, they are placed in the first N slots. Then, for each keyword argument, the identifier is used to determine the corresponding slot. If the slot is already filled, a TypeError exception is raised. Otherwise, the argument is placed in the slot, filling it (even if the expression is None, it fills the slot). When all arguments have been processed, any slots that are still empty are filled with the default value from the function definition. If there are unfilled slots for which no default value is specified, a TypeError exception is raised. Otherwise, the list of filled slots is used as the argument list for the call.
Implementation Details in AVAP
In AVAP, variables are stored as strings, and lists and objects are managed using specific commands:
Lists: To generate a list, use variableToList(variable, list). To retrieve an item from the list, use itemFromList(list, index, variable_to_store_item). To get the number of items in the list, use getListLen(list, var_to_store_list_length).
Objects (dictionaries): An object is created with AddvariableToJSON(key, value, object_variable). To retrieve a key from the object, use variableFromJSON(object_variable, key, var_to_store_key_value).
Usage Example
Creation and management of lists:
// Creating a list
variableToList("item1", "myList")
variableToList("item2", "myList")
variableToList("item3", "myList")
// Retrieving an item from the list
itemFromList("myList", 1, "myVariable")
// Getting the length of the list
getListLen("myList", "listLength")
Creation and management of objects (dictionaries):
// Creating an object
AddvariableToJSON("key1", "value1", "myObject")
AddvariableToJSON("key2", "value2", "myObject")
// Retrieving a value by key from the object
variableFromJSON("myObject", "key1", "myVariable")
In this way, lists and objects in AVAP can be manipulated using the specific functions provided for working with variables stored as strings.

View File

@ -1,84 +0,0 @@
Binary Arithmetic Operations
Binary arithmetic operations have the conventional levels of precedence. Some of these operations also apply to certain non-numeric types. Aside from the exponentiation operator, there are two levels: one for multiplicative operators and another for additive ones:
m_expr ::= u_expr | m_expr "*" u_expr | m_expr "@" m_expr |
m_expr "//" u_expr | m_expr "/" u_expr |
m_expr "%" u_expr
a_expr ::= m_expr | a_expr "+" m_expr | a_expr "-" m_expr
The * (multiplication) operator produces the product of its arguments. The arguments can both be numbers, or one argument must be an integer and the other a sequence. In the first case, the numbers are converted to a common type and then multiplied. In the second case, sequence repetition occurs; a negative repetition factor produces an empty sequence.
The @ (matrix multiplication) operator is intended for matrix multiplication. No built-in type in Python implements this operator.
The / (division) and // (floor division) operators produce the quotient of their arguments. Numeric arguments are converted to a common type. Division between integers produces a floating-point number, while floor division between integers results in an integer; the result is that of a mathematical division with the “floor” function applied to the result. Division by zero raises a ZeroDivisionError.
The % (modulus) operator produces the remainder of the division of the first argument by the second. Numeric arguments are converted to a common type. A zero argument on the right raises a ZeroDivisionError. Arguments can be floating-point numbers, e.g., 3.14 % 0.7 is equal to 0.34 (since 3.14 is equal to 4 * 0.7 + 0.34). The modulus operator always produces a result with the same sign as its second operand (or zero); the absolute value of the result is strictly smaller than the absolute value of the second operand.
The floor division and modulus operators are connected by the following identity: x == (x // y) * y + (x % y). Floor division and modulus are also connected by the built-in function divmod(): divmod(x, y) == (x // y, x % y).
In addition to performing the modulus operation on numbers, the % operator is also overloaded by string objects for old-style string formatting (also known as interpolation). The syntax for string formatting is described in the Python Library Reference, section Old-Style String Formatting.
The floor division operator, the modulus operator, and the divmod() function are not defined for complex numbers. Instead, convert to a floating-point number using the abs() function if appropriate.
The + (addition) operator produces the sum of its arguments. The arguments must both be numbers or both be sequences of the same type. In the first case, the numbers are converted to a common type and then added. In the second case, the sequences are concatenated.
The - (subtraction) operator produces the difference between its arguments. Numeric arguments are converted to a common type.
Shift Operations
Shift operations have lower precedence than arithmetic operations:
shift_expr ::= a_expr | shift_expr ("<<" | ">>") a_expr
These operators accept integers as arguments. They shift the first argument left or right by the number of bits specified by the second argument.
A right shift by n bits is defined as an integer floor division by pow(2, n). A left shift by n bits is defined as a multiplication by pow(2, n).
Binary Bitwise Operations
Each of the three binary bitwise operations has a different level of precedence:
and_expr ::= shift_expr | and_expr "&" shift_expr
xor_expr ::= and_expr | xor_expr "^" and_expr
or_expr ::= xor_expr | or_expr "|" xor_expr
* The & operator produces the bitwise AND of its arguments, which must be integers.
* The ^ operator produces the bitwise XOR (exclusive OR) of its arguments, which must be integers.
* The | operator produces the bitwise OR (inclusive OR) of its arguments, which must be integers.
Comparisons
Unlike C, all comparison operations in Python have the same priority, which is lower than any arithmetic, shift, or bitwise operation. Also, unlike C, expressions like a < b < c have the conventional mathematical interpretation:
comparison ::= or_expr (comp_operator or_expr)*
comp_operator ::= "<" | ">" | "==" | ">=" | "<=" | "!="
| "is" ["not"] | ["not"] "in"
Comparisons produce boolean values: True or False.
Comparisons can be arbitrarily chained, e.g., x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once.
Formally, if a, b, c, ..., y, z are expressions and op1, op2, ..., opN are comparison operators, then a op1 b op2 c ... y opN z is equivalent to a op1 b and b op2 c and ... y opN z, except that each expression is evaluated at most once.
Note that a op1 b op2 c does not imply any comparison between a and c, so, for example, x < y > z is perfectly legal.
Value Comparisons
The operators <, >, ==, >=, <=, and != compare the values of two objects. The objects do not need to be of the same type.
The chapter Objects, Values, and Types states that objects have a value (in addition to type and identity). The value of an object is a rather abstract notion in Python: For example, there is no canonical method to access the value of an object. Furthermore, there is no requirement that the value of an object must be constructed in a particular way, e.g., composed of all its data attributes. Comparison operators implement a particular notion of what an object's value is.
The default behavior for equality comparison (== and !=) is based on object identity. Therefore, comparison of instances with the same identity results in equality, and comparison of equality of instances with different identities results in inequality.
No default comparison order (<, >, <=, >=) is provided; an attempt generates a TypeError.
The following list describes the comparison behavior of the most important built-in types:
Numbers: Built-in numeric types (int, float, complex) and types from the standard library (fractions.Fraction and decimal.Decimal) can be compared with themselves and among their types, with the restriction that complex numbers do not support order comparisons. Within the limits of the involved types, they are compared mathematically correctly without loss of precision.
None and NotImplemented: They are singletons. PEP 8 advises that comparisons for singletons should be done with is or is not, never with equality operators.
Binary Sequences: Instances of bytes or bytearray compare lexicographically using the numeric values of their elements.
Character Strings: Instances of str compare lexicographically using Unicode code points (the result of the built-in ord() function) or their characters.
Sequences: Instances of tuple, list, or range can only be compared within their types, with the restriction that ranges do not support order comparisons. Equality comparisons between these types result in inequality, and order comparisons between these types generate TypeError. They compare lexicographically using comparison of their corresponding elements.
Mappings: Instances of dict compare equal if and only if they have the same (key, value) pairs.
Sets: Instances of set or frozenset can be compared with each other and among their types. They define order comparison operators with the intention of checking subsets and supersets.
Other Built-in Types: Most other built-in types do not have comparison methods implemented, so they inherit the default comparison behavior.
User-defined classes that customize their comparison behavior should follow some consistency rules, if possible:
Equality comparison should be reflexive.
Comparison should be symmetric.
Comparison should be transitive.
If any of these conditions are not met, the resulting behavior is undefined.

View File

@ -1,157 +0,0 @@
Simple Statements
In AVAP, a simple statement consists of a single logical line. Multiple simple statements can be placed on a single line, separated by semicolons. The syntax for simple statements is:
simple_stmt ::= expression_stmt | assert_stmt | assignment_stmt | augmented_assignment_stmt | annotated_assignment_stmt | pass_stmt | del_stmt | return_stmt | yield_stmt | raise_stmt | break_stmt | continue_stmt | import_stmt | future_stmt | global_stmt | nonlocal_stmt | type_stmt
Heres a brief overview of each type of simple statement:
Expression Statement (expression_stmt): Executes an expression, which can be used for operations or calling functions.
Assert Statement (assert_stmt): Used for debugging purposes to test conditions.
Assignment Statement (assignment_stmt): Assigns values to variables or data structures.
Augmented Assignment Statement (augmented_assignment_stmt): Performs an operation on a variable and assigns the result back to the variable (e.g., x += 1).
Annotated Assignment Statement (annotated_assignment_stmt): Used for assigning values with annotations (e.g., type hints).
Pass Statement (pass_stmt): A placeholder that does nothing; used for syntactic requirements.
Del Statement (del_stmt): Deletes variables, items, or attributes.
Return Statement (return_stmt): Exits a function and optionally returns a value.
Yield Statement (yield_stmt): Produces a value from a generator function.
Raise Statement (raise_stmt): Raises exceptions for error handling.
Break Statement (break_stmt): Exits the closest enclosing loop.
Continue Statement (continue_stmt): Skips the current iteration of the closest enclosing loop.
Import Statement (import_stmt): Imports modules or specific components from modules.
Future Statement (future_stmt): Enables features from future versions of Python.
Global Statement (global_stmt): Declares variables as global within a function.
Nonlocal Statement (nonlocal_stmt): Declares variables as non-local, affecting scope in nested functions.
Type Statement (type_stmt): Declares or checks types (e.g., type hints).
Each simple statement performs a specific task and contributes to the overall functionality of the AVAP program.
Expression Statements
Expression statements are used (mostly interactively) to compute and write a value, or (usually) to call a method (a function that does not return a meaningful result; in Python, methods return the value None). Other uses of expression statements are allowed and occasionally useful. The syntax for an expression statement is:
expression_stmt ::= starred_expression
An expression statement evaluates the list of expressions (which can be a single expression).
In interactive mode, if the value is not None, it is converted to a string using the built-in function repr(), and the resulting string is written to the standard output on a line by itself (except if the result is None, in which case the called procedure produces no output).
Assignment Statements
Assignment statements in AVAP are used to (re)assign names to values and to modify attributes or elements of mutable objects. Here is the syntax:
assignment_stmt ::= (target_list "=")+ (starred_expression | yield_expression) target_list ::= target ("," target)* [","] target ::= identifier | "(" [target_list] ")" | "[" [target_list] "]" | attributeref | subscription | slicing | "*" target
Here's a breakdown of how assignment statements work:
Assignment Operation: An assignment statement evaluates the list of expressions and assigns the single resulting object to each of the target lists, from left to right.
Recursive Definition: The assignment operation is defined recursively depending on the form of the target list.
Target List: If the target list is a single object without ending in a comma, the object is assigned to that target. If the list contains a target prefixed with an asterisk, the object must be iterable with at least as many elements as targets, minus one. Elements before the starred target are assigned to the respective targets, and the remaining elements are assigned to the starred target.
Single Target: If the target is an identifier (name), it is bound to the object in the current local namespace. For other targets, names are bound in the global or enclosing namespace, depending on `nonlocal`.
Attribute Reference: If the target is an attribute reference, the primary expression is evaluated. It must produce an object with assignable attributes.
Subscription: If the target is a subscription, the primary expression is evaluated to produce a mutable sequence or mapping object, which is then used to assign the value.
Slice: If the target is a slice, the primary expression is evaluated, and the sequence object is requested to replace the slice with the assigned sequence elements.
In summary, assignment statements in AVAP are crucial for assigning values to variables and modifying data structures effectively.
Return Statement
The return statement in AVAP is used to return the value of a desired variable from a function. Here is the syntax:
return(variable_to_return):
Here is an overview of how the return statement works:
Function Context: The return statement can only occur within a function definition, not inside a nested class definition.
Variable Evaluation: If a variable is provided, it is evaluated. If no variable is specified, None is used by default.
Function Exit: The return statement exits the current function call and returns the specified value.
Interaction with try-finally: When the return statement is executed within a try statement that has a finally clause, the finally clause is executed before the function exits.
Generator Functions: In generator functions, the return statement indicates the end of the generator. It causes a StopIteration exception to be raised, with the returned value (if any) used to construct the StopIteration exception and set as the StopIteration.value attribute.
The return statement is a fundamental part of functions and generators, allowing for the output of values and proper function termination.
Raise Statement
In AVAP, the raise statement is used to throw an exception. The syntax for the raise statement is as follows:
raise [expression ["from" expression]]
If no expressions are present, raise re-raises the currently handled exception, also known as the active exception. If there is no active exception, a RuntimeError is raised indicating that it is an error.
Otherwise, raise evaluates the first expression as the exception object. It must be a subclass or an instance of BaseException. If it is a class, the exception instance is obtained when needed by creating an instance of the class without arguments.
The type of the exception is the instance of the exception class, and the value is the instance itself.
The from clause is used for exception chaining: if provided, the second expression must be another class or instance of exception. If the second expression is an exception instance, it will be attached to the raised exception as the __cause__ attribute (which is modifiable). If the expression is an exception class, the class will be instantiated and the resulting exception instance will be attached to the raised exception as the __cause__ attribute. If the raised exception is not handled, both exceptions will be printed.
startLoop() try: print(1 / 0) except Exception as exc: raise RuntimeError("Something went wrong") from exc endLoop()
A mechanism works implicitly if a new exception is raised while an exception is already being handled. An exception may be handled by an except or finally clause, or a with statement. The previous exception is then attached as the new exceptions __context__ attribute:
startLoop() try: print(1 / 0) except: raise RuntimeError("Something went wrong") from None endLoop()
Exception chaining can be explicitly suppressed by specifying None in the from clause:
startLoop() try: print(1 / 0) except: raise RuntimeError("Something went wrong") from None endLoop()
Break Statement
In AVAP, the break statement is used to terminate the closest enclosing loop. The syntax for the break statement is as follows:
break()
When a break statement is encountered, it causes the loop to exit immediately, regardless of the loop's condition or any remaining iterations. This effectively transfers control to the statement following the loop.
The break statement is typically used within for or while loops to provide a way to exit the loop prematurely based on a certain condition.
addVar(_status, "OK") startLoop(idx, 0, 9) if(idx, 4, "==") idx = -1 break() end() endLoop() addResult(idx) addStatus("OK")
In this example, the loop will terminate when i equals 5, and "Loop ended" will be printed. The numbers 0 through 4 will be printed before the loop is exited.
Break Statement
The break statement in AVAP is used to terminate the closest enclosing loop. Here is an overview of its behavior:
Usage Context: The break statement can only occur within a for or while loop. It cannot be nested within a function or class definition inside that loop.
Loop Termination: It terminates the closest enclosing loop and skips the optional else clause if the loop has one.
Loop Control Target: If a for loop is terminated by break, the loop control target retains its current value.
Interaction with try-finally: When break is executed within a try statement with a finally clause, the finally clause is executed before actually exiting the loop.
The break statement is essential for controlling loop execution, allowing for early exit from loops and proper handling of loop cleanup.
Continue Statement
In AVAP, the continue statement is used to proceed with the next iteration of the closest enclosing loop. The syntax for the continue statement is as follows:
continue
The continue statement can only syntactically occur nested within a for or while loop, but not within a function or class definition inside that loop.
When continue is used within a loop that is also handling exceptions with a try statement containing a finally clause, the finally clause is executed before the next iteration of the loop begins.
for i in range(10): try: if i % 2 == 0: continue print(i) finally: print("In finally clause") print("Loop ended")
In this example, the continue statement will skip the current iteration when i is even, but before moving to the next iteration, the finally clause will print "In finally clause." For odd numbers, the loop will print the number and then "In finally clause." After the loop finishes, "Loop ended" will be printed.
Include Statement
In AVAP, the include statement is used to include an entire code file and define names in the local namespace. The syntax for the include statement is as follows:
include file.avap
The include statement in AVAP includes an entire code file and makes it available in the local namespace. No alias is assigned to the included file; the file is simply referred to by its name.
For example:
# In the 'module.avap' file example_variable = 10 # In the main file include module.avap addResult(example_variable) # Will print 10
In this example, the main file includess the module.avap file and can access the example_variable defined in that file using the module.avap syntax.
Compound Statements
In AVAP, compound statements contain (groups of) other statements; these affect or control the execution of those other statements in some way. In general, compound statements span multiple lines, though in simpler representations a complete compound statement might be contained within a single line.
if statements implement traditional flow control constructs. match specifies matching patterns for variable values. Function and class definitions are also syntactically compound statements.
A compound statement consists of one or more "clauses." A clause consists of a header and a "suite." The clause headers of a particular compound statement are all at the same level of indentation. Each clause header begins with a uniquely identifying keyword and ends with a colon. A suite is a group of statements controlled by a clause. A suite can be one or more simple statements separated by semicolons on the same line as the header, following the colon of the header, or it can be one or more statements indented on subsequent lines. Only the latter form of a suite can contain nested compound statements.
Control Flow Structures in AVAP
In AVAP, control flow structures include conditional statements and loops, which allow you to control the flow of execution based on conditions and iterate over a range of values.
If Statements
The syntax for an if statement in AVAP is:
if (variable, variableValue, comparator, expression) code to execute else() code to execute end()
This structure checks if the condition (variable compared to variableValue with the given comparator) is true, and if so, executes the block of code.
Loops
The syntax for a loop in AVAP is:
startLoop(variable, from, to) code to execute endLoop()
This structure initiates a loop where the variable iterates from the 'from' value to the 'to' value, executing the code block for each iteration.
The if Statement
The if statement in AVAP is used for conditional execution. The syntax is as follows:
if (variable, variableValue, comparator, expression) code to execute else() code to execute end()
This statement evaluates the condition specified by the variable, variableValue, comparator, and expression. It selects exactly one of the suites (blocks of code) by evaluating the expressions one by one until a true condition is found. The corresponding suite is then executed. If all conditions are false, no suites are executed.
The try Statement
The try statement in AVAP specifies exception handlers and/or cleanup code for a block of statements. The syntax is as follows:
try() code to execute exception() code to execute end()
The try block contains code that might raise an exception. The exception block contains code to handle exceptions raised by the try block. If an exception occurs, control is transferred to the except block. If no exception occurs, the except block is skipped.
Additional information about exceptions can be found in the section Exceptions, and information about using the raise statement to throw exceptions can be found in the section The raise Statement.

View File

@ -1,163 +0,0 @@
Patterns in AVAP
In AVAP, patterns provide a powerful way to match and destructure values. Patterns can be used in match statements to perform complex value comparisons and deconstructions. Here is a description of the available patterns and how they are used:
Literal Patterns: Match specific literal values such as numbers, strings, or booleans. For example:
match value: case 10: # Code to execute if value is 10 case "hello": # Code to execute if value is "hello"
Variable Patterns: Capture the value of a variable. This allows you to use the matched value in the corresponding case block:
match value: case x: # Code to execute, x will be assigned the value
Sequence Patterns: Match sequences like lists or tuples. You can also use the * operator to capture remaining elements:
match value: case [1, 2, *rest]: # Code to execute, rest will capture any additional elements
Mapping Patterns: Match dictionaries or similar mappings by specifying keys and their corresponding patterns:
match value: case "key": 42: # Code to execute if the dictionary has "key" with value 42
Class Patterns: Match instances of classes. You can also match specific attributes within the instance:
match value: case MyClass(attr1=42): # Code to execute if value is an instance of MyClass with attr1 equal to 42
Patterns in AVAP offer a flexible approach for handling different kinds of data structures and values, making it easier to write expressive and maintainable code.
OR Patterns
An OR pattern in AVAP allows you to specify multiple patterns separated by vertical bars (|). The OR pattern attempts to match each of its subpatterns with the subject value in order. If any of the subpatterns match, the OR pattern is considered successful. If none of the subpatterns match, the OR pattern fails.
or_pattern ::= "|".closed_pattern+
Here's how you can use OR patterns in practice:
match value: case 1 | 2 | 3: # Code to execute if value is 1, 2, or 3 case "hello" | "world": # Code to execute if value is "hello" or "world" case _: # Code to execute if value does not match any of the above
In this example:
The first case will match if value is either 1, 2, or 3.
The second case will match if value is either "hello" or "world".
The last case is a catch-all pattern that will execute if none of the previous patterns match.
OR patterns provide a concise way to handle multiple possible values or types, simplifying pattern matching and making your code more readable.
AS Patterns
An AS pattern in AVAP is used to bind an OR pattern to a name. This allows you to match a value with an OR pattern and simultaneously capture it under a specified name for further use. The syntax for an AS pattern is:
as_pattern ::= or_pattern "as" capture_pattern
When an AS pattern is used, if the OR pattern succeeds, the subject is bound to the name specified by the capture pattern, and the AS pattern itself succeeds.
Here's an example of how to use AS patterns:
match value: case 1 | 2 | 3 as x: print(f"Matched a number: x") case "hello" | "world" as greeting: print(f"Matched a greeting: greeting") case _: print("No match")
In this example:
The first case matches if value is 1, 2, or 3. The matched value is bound to the name x, which is then used in the print statement.
The second case matches if value is "hello" or "world". The matched value is bound to the name greeting, which is then used in the print statement.
The last case is a catch-all pattern that executes if none of the previous patterns match.
AS patterns are useful for capturing matched values under a name while using OR patterns, allowing for more flexible and readable pattern matching in your code.
Literal Patterns
In AVAP, literal patterns are used to match specific literal values, such as numbers, strings, or boolean values. The syntax for a literal pattern is:
literal_pattern ::= signed_number | strings | "None" | "True" | "False"
A literal pattern only succeeds if the value of the subject is equal to the specified literal value.
Here are examples of literal patterns and their usage:
match value: case 42: print("Matched the number 42") case "hello": print("Matched the string 'hello'") case None: print("Matched None") case True: print("Matched True") case False: print("Matched False") case _: print("No match")
In this example:
case 42: matches if value is exactly 42.
case "hello": matches if value is the string "hello".
case None: matches if value is None.
case True: matches if value is True.
case False: matches if value is False.
case _: is a catch-all pattern that executes if none of the previous patterns match.
Literal patterns are useful for matching specific, known values and are a fundamental part of pattern matching in AVAP.
Capture Patterns
In AVAP, capture patterns are used to bind the subject's value to a name. The syntax for a capture pattern is:
capture_pattern ::= NAME
Capture patterns always succeed and bind the value of the subject to the specified name.
Heres how you might use capture patterns in AVAP:
match value: case x: print(f"Captured value: x")
In this example:
case x: captures whatever value is in value and binds it to the name x. The pattern always succeeds.
Capture patterns are useful when you want to extract and use the value of the subject within your code, regardless of what that value is.
Wildcard Patterns
In AVAP, wildcard patterns are used to match any value without binding it to a name. The syntax for a wildcard pattern is:
wildcard_pattern ::= '_'
Wildcard patterns always succeed and do not create any bindings. They are useful when you want to ignore the value of the subject and only care about whether it matches a certain pattern.
Heres how you might use wildcard patterns in AVAP:
match value: case _: print("Matched any value")
In this example:
case _: matches any value and does not bind it to a name. The pattern always succeeds, and the code within this case will be executed regardless of the value.
Wildcard patterns are particularly useful when you need to handle a broad range of possibilities and are only interested in whether a value fits a general condition, not in the value itself.
Value Patterns
In AVAP, value patterns are used to match specific values. The syntax for a value pattern is:
value_pattern ::= attr
Value patterns only succeed if the subject's value matches the specified value. They are useful when you want to perform actions based on an exact value.
Heres how you might use value patterns in AVAP:
match value: case 42: print("Matched the value 42") case "hello": print("Matched the string 'hello'") case _: print("Matched something else")
In this example:
case 42: matches the value 42 specifically.
case "hello": matches the string "hello" specifically.
case _: matches any other value not covered by the previous cases.
Value patterns are ideal for scenarios where you need to check for specific values and respond accordingly. They provide precise control over the matching process.
Group Patterns
In AVAP, group patterns are used to group multiple patterns together. The syntax for a group pattern is:
group_pattern ::= "(" pattern ")"
Group patterns are useful when you want to combine patterns or when patterns need to be evaluated together. They have the same effect as the pattern they contain but allow for more complex pattern structures.
Heres an example of how to use group patterns in AVAP:
match value: case (42 | 43): print("Matched either 42 or 43") case (name, age) if age > 18: print(f" is an adult") case _: print("Matched something else")
In this example:
case (42 | 43): uses a group pattern to match either the value 42 or 43.
case (name, age) if age > 18: uses a group pattern to match a tuple and includes an additional condition on the age.
case _: matches any other value not covered by the previous cases.
Group patterns are ideal for creating more complex matching scenarios where patterns need to be combined or grouped together.
Sequence Patterns
In AVAP, sequence patterns are used to match elements within sequences like lists or tuples. The syntax for sequence patterns is:
sequence_pattern ::= "[" [maybe_sequence_pattern] "]" | "(" [open_sequence_pattern] ")"
Sequence patterns can match elements of sequences based on specific rules. Heres how they work:
List Patterns: Use square brackets [ ] to match lists. You can include patterns for the elements within the list.
case [a, b, c]: print("Matched a list with three elements")
Tuple Patterns: Use parentheses ( ) to match tuples. Similarly, you can specify patterns for the tuple elements.
case (x, y): print("Matched a tuple with two elements")
Sequence patterns allow for flexible and powerful matching of sequence types. They can match sequences of various lengths and structures by defining the pattern for each element.
Heres an example of using sequence patterns in a match statement:
match value: case [1, 2, 3]: print("Matched a list with elements 1, 2, 3") case (a, b, c) if a + b == c: print("Matched a tuple where a + b equals c") case _: print("Matched something else")
In this example:
case [1, 2, 3]: matches a list with exactly the elements 1, 2, and 3.
case (a, b, c) if a + b == c: matches a tuple and includes a condition to check if a + b equals c.
case _: matches any other value not covered by the previous cases.
Mapping Patterns
In AVAP, mapping patterns are used to match mapping elements, such as dictionaries. Here is the syntax and behavior of mapping patterns:
mapping_pattern ::= { [items_pattern] }
Mapping Patterns are designed to match elements within mappings, such as dictionaries. They use specific rules to determine if a pattern matches the given mapping.
Syntax: Mapping patterns are enclosed in curly braces { ... }. The items_pattern specifies the pattern for the mapping items.
Matching Rules: The rules for matching mapping patterns include checking for key-value pairs in the mapping and ensuring they align with the specified pattern.
Usage: Mapping patterns are useful for destructuring dictionaries and other mapping types in a concise manner.
Mapping patterns enhance pattern matching capabilities by allowing for specific and flexible matching of dictionary elements.
Class Patterns
In AVAP, class patterns are used to match instances of specific classes. Here is a detailed overview:
class_pattern ::= name "(" [pattern_arguments ","?] ")"
Pattern Syntax: A class pattern specifies the class name followed by a parenthesized list of pattern_arguments. The pattern matches instances of the specified class.
Matching Instances: The pattern will match if the subject is an instance of the specified class and the pattern_arguments (if any) match according to the rules defined for the pattern.
Usage: Class patterns are useful for deconstructing objects based on their class and extracting values from them, enabling more precise pattern matching.
These patterns provide a way to work with objects based on their class type and structure, facilitating more sophisticated pattern matching and value extraction.

View File

@ -1,112 +0,0 @@
Execution Model in AVAP
4.1. Structure of a Program
A program in AVAP is built from code blocks that execute linearly. A block is a section of the AVAP program text that executes as a unit. Code blocks in AVAP include:
A script file.
The body of a function.
An import statement for additional files.
Each line of code in AVAP is considered a block and executes sequentially. There is no interactive execution, deferred execution, or object classes.
4.2. Names and Bindings
4.2.1. Name Binding
Names in AVAP refer to values and are introduced through name binding operations. The following constructs bind names:
Formal parameters of functions.
Function definitions.
Assignment expressions.
Name binding is performed using the addVar(value, variable) function, which assigns the value to the specified variable. There are no class declarations or complex targets in AVAP. Only functions and direct assignments to variables are valid code blocks.
4.2.2. Name Resolution
A scope defines the visibility of a name in a code block. In AVAP, if a variable is defined in a code block, its scope includes that block. The scope of a variable within a function extends to the entire function block.
When a name is used in a code block, it is resolved using the nearest enclosing scope. If the name is not found in the current scope, a NameError exception is raised.
If a name binding operation occurs anywhere within a code block, all uses of the name within that block are treated as references to the current block. This means that variables must be defined before their use within the same block.
In AVAP, there are no global or nonlocal declarations. All names are resolved within the scope in which they are defined. There is no dynamic code execution with eval or exec, so all bindings must be static and known at code writing time.
4.3. Importing Files
In AVAP, it is possible to import the contents of other code files. The import file.avap statement inserts the contents of the specified file at the exact point where the import statement appears. This process is linear and sequential, meaning that the imported content is executed as if it were part of the original file.
It is crucial that the necessary functions are defined before they are called. If a function is not defined before its call, a NameError exception will be raised.
Example of import usage:
avap
// Content of the file main.avap
addVar(x, 10)
include functions.avap
myFunction(x)
// Content of the file functions.avap
function myFunction(y){
addVar(result, y + 5)
addResult(result)
}
4.4. Exceptions
Exceptions in AVAP allow for the handling of errors or exceptional conditions. An exception is raised when an error is detected; it can be handled by the surrounding code block or by any code block that directly or indirectly invoked the block where the error occurred.
The AVAP interpreter raises an exception when it detects a runtime error. An AVAP program can also explicitly raise an exception using the raise statement. Exception handlers are specified with the try ... except statement.
Example of exception handling:
try()
addVar(10 / 0, result)
except()
addResult("Cannot divide by zero.")
end()
In this example, if a division by zero occurs, a ZeroDivisionError exception is raised and handled by the except block.
This structure ensures that AVAP programs execute in a sequential and predictable manner, without advanced dynamic or deferred execution features, maintaining simplicity and clarity in name binding and import handling.
5. The Import System in AVAP
AVAP code in one file gains access to code in another file through the import process. The import statement is the only way to invoke the import machinery in AVAP.
The include statement inserts the contents of the specified file at the exact point where the import statement appears in the original file. There are no other ways to invoke the import system in AVAP.
When an include statement is executed, the contents of the imported file are processed as if they were part of the original file, ensuring that all functions and variables from the imported file are available in the context of the original file. If the specified file is not found, a FileNotFoundError is raised.
Example of using the include statement in AVAP:
Content of file main.avap
addVar(x,10)
include functions.avap
myFunction(x)
Content of file functions.avap
function myFunction(y){
addVar(result, y + 5)
addResult(result)
}
In this example, the content of functions.avap is inserted into main.avap at the point of the import statement, ensuring that myFunction is defined before being called.
5.1. Import Rules
Position of Import: The include statement must be placed at the exact location where the content of the imported file is to be included. The content of the imported file is executed linearly along with the original file.
Import Error: If the file specified in the include statement is not found, a FileNotFoundError is raised.
Scope of Imports: The functions and variables from the imported file are added to the local scope of the original file at the point of import. This means they can be accessed as if they were defined in the same file.
5.2. Limitations and Considerations
No Packages: Unlike other languages, AVAP does not have a hierarchical package system. Each file is imported independently and treated as an autonomous unit.
Sequential Execution: Execution in AVAP is sequential and does not allow lazy or deferred execution. Therefore, all functions and variables must be defined before use, and the content of imported files must be in the correct order.
No Conditional Import: The import statement in AVAP does not support conditions. The specified file will always be imported at the point of the statement, regardless of any conditions.
5.3. Advanced Example
Consider the following example where multiple files are imported:
Content of the file main.avap
addVar(5, a)
include utilities.avap
include operations.avap
addVar(b, increment(a))
addVar( c, multiply(b, 2))
addResult(c)
Content of the file utilities.avap
function increment(x){
return(x + 1)
}
Content of the file operations.avap
function multiply(x, y){
return(x * y)
}
In this example, utilities.avap and operations.avap are imported into main.avap at the specified points, allowing the increment and multiply functions to be used in main.avap.

View File

@ -1,41 +0,0 @@
IF-THEN-ELSE Statement
The IF-THEN-ELSE statement in AVAP™ allows for decision-making based on specific conditions and executes different blocks of code depending on the outcome of those conditions. Below is a detailed explanation of its syntax and functionality.
6.1 Syntax of the IF-THEN-ELSE Statement
The basic syntax of the IF-THEN-ELSE statement in AVAP™ is as follows:
if(condition, true_value, operator)
// Block of code if the condition is true else()
// Block of code if the condition is false end()
condition: This is an expression that evaluates to either true or false.
true_value: This is the value assigned if the condition is true.
operator: This is the operator used to compare the condition with the true value.
6.2 Functioning of the IF-THEN-ELSE Statement
The IF-THEN-ELSE statement evaluates the given condition and, if it is true, executes the block of code within the IF(). If the condition is false, it executes the block of code within the ELSE().
Below is the description of each part of the IF-THEN-ELSE statement using the provided example:
// IF, ELSE and END Sample Use
addVar(selector,'yes')
if(selector,'yes','=')
addVar(result,1)
else()
addVar(result,0)
end()
addResult(result)
The variable selector is initialized with the value 'yes'.
The statement IF(selector,'yes','=') evaluates whether the value of selector is equal to 'yes'. In this case, the condition is true.
Inside the IF() block, addVar(result,1) is executed, which assigns the value 1 to the result variable.
Since the condition of the IF() is true, the code block inside the ELSE() is not executed.
The statement addResult(result) adds the value of the result variable to the API result.
6.3 Result
The result returned by the API after executing the above code is as follows:
{
status
, elapsed:0.008270740509033203, result: { result:1 }
}
This result indicates that the execution was successful (status:true) and that the value of result is 1.
6.4 Conclusions
The IF-THEN-ELSE statement in AVAP™ provides an efficient way to make decisions based on specific conditions. Similar to other programming languages, it allows for executing different blocks of code based on the outcome of evaluating a condition.

View File

@ -1,42 +0,0 @@
StartLoop() Statement
The loop statement in AVAP™ allows you to execute a block of code repeatedly until a specific condition is met. Below is a detailed explanation of its syntax and functionality.
7.1 Syntax of the Loop Statement
The full syntax of the loop statement in AVAP™ is as follows:
startLoop(control, start, end) // Code block to repeat endLoop()
This syntax consists of three main parts:
control: This is the loop control variable used to track the progress of the loop. It is initialized with the starting value of the loop and is incremented with each iteration until it reaches the end value.
start: This is the starting value of the loop. The loop begins at this value.
end: This is the ending value of the loop. The loop terminates when the control variable reaches this value.
7.2 Functioning of the Loop Statement
The loop statement in AVAP™ follows this execution process:
The control variable control is initialized with the starting value specified in start.
The loop condition is evaluated: while the value of control is less than or equal to the end value end, the code block within startLoop() is executed. If the value of control exceeds the end value, the loop terminates, and execution continues after endLoop().
In each iteration of the loop, the code block within startLoop() is executed, and the control variable control is automatically incremented by one.
Once the control variable reaches or exceeds the end value end, the loop terminates, and execution continues after endLoop().
7.3 Example of Use
Below is an example of using the loop statement in AVAP™, along with a detailed explanation of each part of the code:
// Loop Sample Use
// Initialize the variable 'variable' with the value 5.
addVar(variable,5)
// Start the loop with the control variable 'control', ranging from 1 to 5.
startLoop(control,1,5)
// In each iteration of the loop, assign the current value of 'control' to the variable 'counter'.
addVar(counter,$control)
endLoop()
// Add the final value of 'counter' to the API result.
addResult(counter)
7.4 Result and Conclusions
After executing the above code, the result returned by the API is as follows:
{
status
, elapsed:0.01605510711669922, result: { counter:5 }
}
This result confirms that the execution was successful (status:true) and that the final value of counter is 5.
In summary, the loop statement in AVAP™ provides an efficient way to execute a block of code repeatedly within a specified range. By automating tasks that require repetition, such as processing a list of items or generating sequential numbers, this statement becomes a fundamental tool for programming in AVAP™.

View File

@ -1,26 +0,0 @@
Chapter 12: addParam() Function
Introduction
The addParam() function in AVAP™ is a powerful tool used to add parameters to an API call in the query string. This parameter is assigned to a variable and acts as a bridge between the API call and the API itself, allowing smooth and efficient communication between both.
Usage of addParam
The addParam() function is used to add parameters to an API call in the query string. The basic syntax of this function is as follows:
addParam(variable, value)
Where variable is the name of the variable to be used as a parameter in the API call, and value is the value assigned to this variable.
Example Usage
Below is a practical example illustrating how to use the addParam() function in an API call:
# API call with addParam()
addParam(user, user_var)
addParam(password, password_var)
In this example, two parameters, user and password, are being added to an API call. The value of user is set to user_var and the value of password is set to password_var.
Internal Operation
Internally, the addParam() function constructs the querystring for the API call by adding the specified parameters along with their corresponding values. This querystring is passed to the API, which uses it to process the request and return the appropriate response.
Important Considerations
It is important to ensure that the parameters added with addParam() are valid and correctly formatted according to the requirements of the API being called. Additionally, it is the developer's responsibility to ensure that the values assigned to the parameters are secure and do not contain malicious data that could compromise system security.
Conclusions
The addParam() function in AVAP™ is an essential tool for constructing and managing API calls, facilitating communication between the client and the server. By understanding how this function works and how it is used in the context of an API call, developers can create more robust and secure applications that make the most of web services' potential.

View File

@ -1,42 +0,0 @@
Function Libraries
Introduction
Includes are a fundamental feature in AVAP™ that allow for the efficient organization and reuse of code in software development projects. Just like in other programming languages, includes in AVAP™ enable the incorporation of functionalities from other files or libraries into the current file. This capability provides a number of significant advantages that make the development and maintenance of projects more efficient and effective.
Purpose of Includes
The primary purpose of includes in AVAP™ is to promote modularity and code reuse. By dividing code into separate modules or files and then including them in main files as needed, developers can write and maintain code in a more organized and structured manner. This facilitates the management of large and complex projects, as well as collaboration between development teams.
Advantages of Using Includes
Code Reuse: Includes allow for the reuse of functions, variables, and other code definitions in multiple parts of a project, reducing code duplication and promoting consistency and coherence in development.
Facilitates Maintainability: By dividing code into smaller, more specific modules, it is easier to identify, understand, and modify parts of the code without affecting other parts of the project. This eases software maintenance over time.
Promotes Modularity: The ability to include files selectively as needed encourages code modularity, which simplifies understanding and managing complex projects by breaking them down into smaller, manageable components.
Improves Readability and Organization: The use of includes helps organize code in a logical and structured manner, improving readability and facilitating navigation through different parts of the project.
Syntax of Includes
In AVAP™, the syntax for including a file is similar to that of other languages like C. The keyword include is used followed by the name of the file to be included. There are two main ways to include files in AVAP™:
Local Include: Used to include project-specific files located in the same directory or in subdirectories relative to the current file. The file name is specified within quotes. Example:
include "file_name.avap"
System Include: Used to include standard or system library files located in predefined or configured paths on the system. The file or library name is specified between angle brackets (< and >). Example:
include <library_name.avap>
Operation
When an include is found in an AVAP™ file, the interpreter searches for the specified file and incorporates it into the current file at compile time. This means that all the code contained in the included file will be available for use in the current file.
Common Uses
Including Standard Libraries: Standard libraries that provide common functions and utilities can be included to simplify application development.
Including Definition Files: Files containing definitions of variables, constants, or data structures used in multiple parts of the project can be included.
Including Specific Functionality Modules: Modules providing additional features for the project, such as file handling, text processing, or data manipulation, can be included.
Practical Example
Suppose we have a file named utils.avap that contains utility functions we want to use in our main project. We can include this file in our main project as follows:
include "utils.avap" // We can now use the functions defined in utils.avap
With this understanding of the value and advantages of using includes in AVAP™, we will explore in detail their operation and practical application in project development.
Practical Example
Suppose we have a file named utils.avap that contains utility functions we want to use in our main project. We can include this file in our main project as follows:
include "utils.avap" // We can now use the functions defined in utils.avap
With this understanding of the value and advantages of using includes in AVAP™, we will explore in detail their operation and practical application in project development.
Function Libraries Function Products
In AVAP™, there are a series of function libraries grouped by categories called Function Products that complement the base AVAP™ language and leverage the power of AVS servers for distribution. Through Function Products, developers can extend the functionality of AVAP™ by incorporating specialized libraries tailored to different needs and applications.
Function Products provide a way to access advanced features and capabilities not available in the core language, offering a robust framework for building complex and scalable solutions. These libraries are designed to integrate seamlessly with AVAP™, enhancing the development process and enabling more efficient and effective project execution.

View File

@ -1,39 +0,0 @@
Function Declaration
Introduction
Functions in AVAP™ are reusable blocks of code that perform a specific task. Just like in Python, functions in AVAP™ allow for code modularization, improved readability, easier maintenance, and code reuse.
Function Construction
In AVAP™, similar to Python, functions are defined using the keyword function , followed by the function name and its parameters in parentheses. The function definition ends with a {, followed by the block of code that forms the function body, and closed by }.
Defining a function in AVAP™
function greet(name){
return("Hello, " + name + "!")
}
Calling the function
message = greet("World")
addResult(message)
Output
Hello, World!
Technical Features
Parameters: Functions can accept zero or more parameters that are used as inputs to the function.
Return Values: Functions can return a value using the return keyword.
Scope: Functions in AVAP™ have their own scope, meaning that variables defined within a function are only visible within that function unless declared as global variables.
Code Reusability: Functions allow for encapsulating and reusing blocks of code that perform specific tasks.
Practical Example
Below is a practical example illustrating the definition and invocation of a function in AVAP™:
Definition of a Function to Calculate the Area of a Circle
function calculate_circle_area(radius){
return(3.14 * radius ** 2)
}
Calling the Function
circle_radius = 5
area = calculate_circle_area(circle_radius)
result = "The area of the circle is: %s" % area
addResult(result)
Output:
The area of the circle is: 78.5
Conclusions
Functions are a fundamental part of programming in AVAP™, allowing for effective organization and modularization of code. By understanding how to define, construct, and call functions in AVAP™, developers can write clearer, more concise, and maintainable code, facilitating the development and management of applications.

View File

@ -1,720 +0,0 @@
Function Glossary
randomString()
The randomString() command generates a random string based on a specified pattern and stores it in a target variable. It is especially useful when random strings are needed to conform to a specific format, such as passwords or identifiers.
Parameters
Pattern
Type: var
Description: A regular expression (regex) pattern that defines the characters and structure of the string to be generated. It can be a direct value or a variable containing the pattern. For example, [a-zA-Z0-9] will generate a string that includes uppercase letters, lowercase letters, and numbers.
Length
Type: var
Description: An integer value specifying the length of the random string to be generated. It can be a direct value or a variable containing the desired length. This value determines how many characters the resulting string will have.
TargetVariable
Type: var
Description: The variable where the generated string will be stored. This variable should be used later in the program. Unlike the other parameters, this must be a variable and not a direct value.
Usage Example
// Direct call with values:
randomString('[a-zA-Z0-9]', 8, generatedPassword)
// Call using variables:
pattern = '[a-zA-Z0-9]'
length = 8
randomString(pattern, length, generatedPassword)
stampToDatetime()
The stampToDatetime() command converts a timestamp value to a date and time according to a specified format, applying a possible time difference, and stores the result in a target variable. It is useful for manipulating and formatting time values into different representations.
Parameters
timestamp
Type: var
Description: A value representing a timestamp, which can be provided directly or through a variable. This value is the starting point for conversion to a date and time format.
Format
Type: var
Description: A format string that defines how the resulting date and time should be presented. This string follows the same conventions used in Python for formatting dates and times. Common symbols include:
%Y: Year with four digits (e.g., 2024)
%m: Month with two digits (01 to 12)
%d: Day of the month with two digits (01 to 31)
%H: Hour in 24-hour format (00 to 23)
%M: Minutes (00 to 59)
%S: Seconds (00 to 59)
For example, the format %Y-%m-%d %H:%M:%S converts a timestamp into a string like 2024-08-25 14:30:00. It can be a direct value or a variable containing the desired format.
TimeDelta
Type: var
Description: An optional value representing a time adjustment (positive or negative) applied to the timestamp before conversion. This value can be provided directly or through a variable and is expressed in seconds.
TargetVariable
Type: var
Description: The variable where the resulting date and time from the conversion will be stored. Unlike the other parameters, this must be a variable and not a direct value.
Usage Example
// Direct call with values:
stampToDatetime(1692966600, '%Y-%m-%d %H:%M:%S', 3600, convertedDatetime)
// Call using variables:
timestamp = 1692966600
format = '%Y-%m-%d %H:%M:%S'
adjustment = 3600
stampToDatetime(timestamp, format, adjustment, convertedDatetime)
In the first example, a timestamp is converted to a date and time in the format "%Y-%m-%d %H:%M:%S", applying a 3600-second (1-hour) adjustment, and the result is stored in the variable convertedDatetime. In the second example, variables are used to define the timestamp, format, and adjustment.
getTimeStamp()
The getTimeStamp() command converts a date and time string, given in a specific format, to a timestamp value. Additionally, it allows for an optional time adjustment before storing the result in a target variable. This command is useful for converting human-readable date and time representations to a numeric timestamp format, which can be used in calculations or time comparisons.
Parameters
DateString
Type: var
Description: A string representing a date and time. This string must follow the format specified in the Format parameter. It can be a direct value or a variable containing the date string.
Format
Type: var
Description: A format string that defines how to interpret the date and time string (DateString). This string follows Python's conventions for formatting and parsing dates and times. Some common symbols include:
%Y: Year with four digits (e.g., 2024)
%m: Month with two digits (01 to 12)
%d: Day of the month with two digits (01 to 31)
%H: Hour in 24-hour format (00 to 23)
%M: Minutes (00 to 59)
%S: Seconds (00 to 59)
For example, to interpret the string "2024-08-25 14:30:00", the format %Y-%m-%d %H:%M:%S would be used. It can be a direct value or a variable containing the format.
TimeDelta
Type: var
Description: An optional value representing a time adjustment (positive or negative) applied to the timestamp after conversion. This value can be provided directly or through a variable and is expressed in seconds.
TargetVariable
Type: var
Description: The variable where the resulting timestamp from the conversion will be stored. Unlike the other parameters, this must be a variable and not a direct value.
Usage Example
// Direct call with values:
getTimeStamp('2024-08-25 14:30:00', '%Y-%m-%d %H:%M:%S', 3600, generatedTimestamp)
// Call using variables:
date = '2024-08-25 14:30:00'
format = '%Y-%m-%d %H:%M:%S'
adjustment = 3600
getTimeStamp(date, format, adjustment, generatedTimestamp)
In the first example, the date and time string "2024-08-25 14:30:00" is converted to a timestamp, applying a 3600-second (1-hour) adjustment, and the result is stored in the variable generatedTimestamp. In the second example, variables are used to define the date, format, and adjustment.
getRegex()
The getRegex() command searches for matches in a source string using a regular expression (regex) pattern and stores the result in a target variable. This command is useful for extracting specific parts of a string that match a defined pattern, such as email addresses, phone numbers, or any other structure defined by a regex.
Parameters
SourceVariable
Type: variable
Description: The variable containing the source string in which to search for regex pattern matches. This string is the text on which the regex search will be applied.
rePattern
Type: variable
Description: The variable containing the regular expression (regex) pattern that defines what to search for in the source string. This pattern should follow standard regex rules, allowing the specification of sequences of characters to identify in the source string.
TargetVariable
Type: variable
Description: The variable where the search result will be stored. Depending on the context and the pattern used, the result could be the first match found, all matches, or even specific groups within the match.
Usage Example
// Direct call with values:
sourceText = "Email: user@example.com and phone: 123-456-7890"
pattern = r"\b\d{3}-\d{3}-\d{4}\b"
getRegex(sourceText, pattern, phoneNumber)
// Call using variables:
sourceText = "Visit our website at https://www.example.com for more information."
regexPattern = r"https?://\S+"
getRegex(sourceText, regexPattern, foundURL)
In the first example, a phone number in the format 123-456-7890 is searched in the sourceText string and the result is stored in the phoneNumber variable. In the second example, a URL is extracted from the sourceText string using a regex that identifies URL patterns, and the result is stored in the foundURL variable.
getDateTime()
The getDateTime() command retrieves the current date and time, formats it according to a specified format, applies an optional time adjustment, and converts it to a specific time zone before storing the result in a target variable. It is useful for obtaining and manipulating the current date and time in different formats and time zones.
Parameters
Format
Type: var
Description: A format string that defines how the resulting date and time should be presented. This string follows the date and time formatting conventions used in Python. Some of the most common symbols include:
%Y: Year with four digits (e.g., 2024)
%m: Month with two digits (01 to 12)
%d: Day of the month with two digits (01 to 31)
%H: Hour in 24-hour format (00 to 23)
%M: Minutes (00 to 59)
%S: Seconds (00 to 59)
For example, the format "%Y-%m-%d %H:%M:%S" will present the date and time as 2024-08-25 14:30:00. It can be a direct value or a variable containing the desired format.
TimeDelta
Type: var
Description: An optional value representing a time adjustment (positive or negative) applied to the current date and time before conversion. This value can be provided directly or through a variable and is expressed in seconds.
TimeZone
Type: var
Description: The time zone to which the date and time should be converted. This value can be a time zone identifier provided directly or through a variable. Some common time zones include:
"UTC": Coordinated Universal Time
"America/New_York": U.S. Eastern Time (EST/EDT)
"America/Los_Angeles": U.S. Pacific Time (PST/PDT)
"Europe/London": London Time (GMT/BST)
"Europe/Madrid": Madrid Time (CET/CEST)
"Asia/Tokyo": Tokyo Time (JST)
"Australia/Sydney": Sydney Time (AEST/AEDT)
You can use any time zone recognized by the pytz library in Python, which includes most time zones worldwide.
TargetVariable
Type: var
Description: The variable in which the resulting date and time from the operation will be stored. Unlike the other parameters, this must be a variable and not a direct value.
Usage Example
// Direct call with values:
getDateTime('%Y-%m-%d %H:%M:%S', 3600, 'UTC', currentTime)
// Call using variables:
format = '%Y-%m-%d %H:%M:%S'
adjustment = 3600
timeZone = 'America/New_York'
getDateTime(format, adjustment, timeZone, currentDateTime)
In the first example, the current date and time are retrieved, adjusted by 3600 seconds (1 hour), converted to UTC, and stored in the variable currentTime. In the second example, variables are used to define the format, time adjustment, and time zone, with the result stored in the currentDateTime variable.
encodeMD5()
The encodeMD5() command generates an MD5 hash of the provided string and stores the result in a target variable. MD5 is a cryptographic hash function that produces a 128-bit value (32 hexadecimal characters), commonly used to verify data integrity.
Parameters
SourceVariable
Type: var
Description: The variable containing the text string to be encoded in MD5. It can be a direct value or a variable storing the input string.
TargetVariable
Type: var
Description: The variable in which the resulting MD5 hash will be stored. Unlike the SourceVariable parameter, this must be a variable and not a direct value.
Usage Example
// Direct call with values:
encodeMD5('example_string', md5Hash)
// Call using variables:
text = 'example_string'
hashVariable = 'md5Hash'
encodeMD5(text, hashVariable)
In the first example, an MD5 hash is generated from the string 'example_string' and stored in the md5Hash variable. In the second example, a variable text is used to define the input string and another variable hashVariable is used to store the resulting MD5 hash.
encodeSHA256()
The encodeSHA256() command generates a SHA-256 hash of the provided string and stores the result in a target variable. SHA-256 is a cryptographic hash function that produces a 256-bit value (64 hexadecimal characters), offering greater security compared to MD5.
Parameters
SourceVariable
Type: var
Description: The variable containing the text string to be encoded in SHA-256. It can be a direct value or a variable storing the input string.
TargetVariable
Type: var
Description: The variable in which the resulting SHA-256 hash will be stored. Unlike the SourceVariable parameter, this must be a variable and not a direct value.
Usage Example
// Direct call with values:
encodeSHA256('example_string', sha256Hash)
// Call using variables:
text = 'example_string'
hashVariable = 'sha256Hash'
encodeSHA256(text, hashVariable)
In the first example, a SHA-256 hash is generated from the string 'example_string' and stored in the sha256Hash variable. In the second example, a variable text is used to define the input string, and another variable hashVariable is used to store the resulting SHA-256 hash.
getQueryParamList()
The getQueryParamList() command extracts the query parameters from the current HTTP request and stores a list of these parameters in a target variable. This is useful for handling and processing query parameters in web applications.
Parameters
TargetVariable
Type: var
Description: The variable in which the extracted query parameter list will be stored. This should be a variable where the command's result will be saved.
Command Flow
Parameter Extraction: Accesses the query parameters from the current HTTP request.
List Construction: Creates a list containing dictionaries, where each dictionary represents a query parameter and its associated value.
Result Storage: Saves the list of parameters in the variable specified by TargetVariable.
Usage Example
Suppose the HTTP query has the following parameters: ?user=alice&age=30.
// Define the variable to store the result
queryParamsList = []
// Call the command to extract query parameters
getQueryParamList(queryParamsList)
// Return the list of query parameters via addResult
addResult(queryParamsList)
Given the query string ?user=alice&age=30, the getQueryParamList() command will generate the following list of parameters:
[
{"user": "alice"},
{"age": "30"}
]
getListLen()
The getListLen() command calculates the length of a list and stores the result in a target variable. This command is useful for determining the number of elements in a list.
Parameters
SourceVariable
Type: var
Description: The variable containing the list whose length you want to calculate. It can be a variable that stores the list or a direct value representing the list.
TargetVariable
Type: var
Description: The variable where the result of the list length will be stored. This should be a variable that will receive the integer value representing the number of elements in the list.
Command Flow
Retrieve the List: Access the list stored in the SourceVariable.
Calculate the Length: Calculate the number of elements in the list.
Store the Result: Save the calculated length in the variable specified by TargetVariable.
Usage Example
Suppose the list in myList is ['apple', 'banana', 'cherry'].
// Variable definitions
myList = ['apple', 'banana', 'cherry']
listLength = 0
// Call the command to calculate the length of the list
getListLen(myList, listLength)
// Return the list length through addResult
addResult(listLength)
Since the list myList has 3 elements, the getListLen() command will calculate that the length is 3. This value will be stored in the listLength variable and returned through addResult(listLength), resulting in the following output:
3
itemFromList()
The itemFromList() command extracts a specific element from a list based on a given index and stores the result in a target variable. This is useful for accessing individual elements within a list.
Parameters
SourceVariable
Type: var
Description: The variable containing the list from which an element is to be extracted. It can be a variable that stores the list or a direct value representing the list.
index
Type: value
Description: The index of the element to be extracted from the list. It must be an integer value that indicates the position of the element within the list.
TargetVariable
Type: var
Description: The variable where the extracted element will be stored. It must be a variable that will receive the value of the element at the specified index position.
Command Flow
Access the List: Access the list stored in the SourceVariable.
Extract the Element: Retrieve the element at the position specified by the index.
Store the Result: Save the extracted element in the variable specified by TargetVariable.
Usage Example
Suppose the list in myList is ['apple', 'banana', 'cherry'] and you want to extract the element at index 1.
// Variable definitions
myList = ['apple', 'banana', 'cherry']
element = ''
// Call the command to extract the element at index 1
itemFromList(myList, 1, element)
// Return the extracted element through addResult
addResult(element)
Since index 1 corresponds to the element 'banana' in the myList, the itemFromList() command will extract 'banana' and store it in the variable element. The element variable will be returned through addResult(element), resulting in the following output:
"banana"
variableFromJSON()
The variableFromJSON() command extracts the value associated with a specific key from a JSON object and stores the result in a target variable. This command is useful for accessing values within a JSON object.
Parameters
SourceVariable
Type: var
Description: The variable containing the JSON object from which a value is to be extracted. It can be a variable that stores the JSON object or a direct value representing the JSON object.
key
Type: value
Description: The key whose value is to be extracted from the JSON object. It must be a value that represents the key within the JSON object.
TargetVariable
Type: var
Description: The variable where the extracted value will be stored. It must be a variable that will receive the value associated with the specified key in the JSON object.
Command Flow
Access the JSON Object: Access the JSON object stored in the SourceVariable.
Extract the Value: Retrieve the value associated with the key within the JSON object.
Store the Result: Save the extracted value in the variable specified by TargetVariable.
Usage Example
Suppose the JSON object in jsonData is "name": "Alice", "age": 30 and you want to extract the value associated with the key "name".
// Variable definitions
jsonData = {"name": "Alice", "age": 30}
nameValue = ''
// Call the command to extract the value associated with the key "name"
variableFromJSON(jsonData, "name", nameValue)
// Return the extracted value through addResult
addResult(nameValue)
Since the value associated with the key "name" in the JSON object jsonData is "Alice", the variableFromJSON() command will extract "Alice" and store it in the variable nameValue. The nameValue variable will be returned through addResult(nameValue), resulting in the following output:
"Alice"
AddVariableToJSON()
The AddVariableToJSON() command adds a new key and its corresponding value to a JSON object and stores the result in a target variable. This command is useful for updating a JSON object with new key-value pairs.
Parameters
Key
Type: variable
Description: The key to be added to the JSON object. It must be a variable that stores the key to be added.
Value
Type: variable
Description: The value associated with the key to be added to the JSON object. It must be a variable that stores the corresponding value.
TargetVariable
Type: variable
Description: The variable where the updated JSON object will be stored. It must be a variable that will receive the JSON object with the new key and its added value.
Command Flow
Access the JSON Object: Access the JSON object stored in the TargetVariable.
Add the Key and Value: Add the new key and its associated value to the JSON object.
Store the Result: Save the updated JSON object in the variable specified by TargetVariable.
Usage Example
Suppose the initial JSON object in jsonData is "name": "Alice", "age": 30, and you want to add a new key "email" with the value "alice@example.com".
// Variable definitions
jsonData = {"name": "Alice", "age": 30}
newKey = "email"
newValue = "alice@example.com"
// Call the command to add the new key and value to the JSON object
AddVariableToJSON(newKey, newValue, jsonData)
// Return the updated JSON object through addResult
addResult(jsonData)
This updated JSON object will be stored in the variable jsonData and will be returned through addResult(jsonData), resulting in the following output:
{
"name": "Alice",
"age": 30,
"email": "alice@example.com"
}
variableToList()
The variableToList() command converts an element into a list that contains only that element and stores the resulting list in a target variable. This command is useful to ensure that a single value is handled as a list in subsequent processing.
Parameters
element
Type: variable
Description: The variable that contains the element to be converted into a list. It can be any type of value that you want to include as the only item in the list.
TargetVariable
Type: variable
Description: The variable in which the resulting list will be stored. It must be a variable that will receive the list with the included element.
Command Flow
Access the Element: Access the element stored in the element variable.
Create the List: Create a list that contains only the provided element.
Store the Result: Save the resulting list in the variable specified by TargetVariable.
Usage Example
Suppose the element in myElement is "apple" and you want to convert it into a list.
// Variable definitions
myElement = "apple"
myList = []
// Call the command to convert the element into a list
variableToList(myElement, myList)
// Return the resulting list through addResult
addResult(myList)
Since myElement is "apple", the variableToList() command will convert this element into a list with a single item: ["apple"]. This list will be stored in the variable myList, and myList will be returned through addResult(myList), resulting in the following output:
["apple"]
addParam()
The addParam() command retrieves the value associated with a specific key from the query string of the current request and assigns this value to a target variable. This command is useful for extracting values from query parameters in an HTTP request and storing them in variables for processing.
Parameters
param
Type: value
Description: The key of the query string whose value you want to retrieve. It should be a value that represents the key in the query string.
variable
Type: var
Description: The variable in which the retrieved value from the query string will be stored. It must be a variable that will receive the value associated with the specified key.
Command Flow
Retrieve the Value: Access the value associated with the param key from the query string of the current request.
Assign the Value: Assign the retrieved value to the variable specified by variable.
Usage Example
Suppose the query string of the current request is ?user=alice&age=30, and you want to retrieve the value associated with the key "user".
// Variable definitions
userName = ''
// Call the command to retrieve the value for the "user" key and assign it to the variable
addParam("user", userName)
// Return the retrieved value through addResult
addResult(userName)
Given the query string ?user=alice&age=30, the addParam() command will retrieve the value "alice" associated with the key "user" and store it in the userName variable. The userName variable will be returned through addResult(userName), resulting in the following output:
"alice"
addResult()
The addResult() command is used to return the content of a variable as part of the command or function response. It is the way to present results or processed data from commands and operations performed in the language.
Parameters
variable
Type: var
Description: The variable whose content is to be returned as the result. It should be a variable that contains the value or data you want to include in the response.
Command Flow
Access the Content: Access the content of the variable provided as a parameter.
Return the Result: Include the content of the variable in the final response.
Example Usage
Suppose we have performed an operation and want to return the result stored in the result variable.
// Define the variable with the result of an operation
result = "Operation completed successfully."
// Call the command to return the content of the variable
addResult(result)
In this example, the addResult(result) command will return the content of the result variable, which is "Operation completed successfully.". This content will be presented as part of the response.
Note
The addResult() command is the primary mechanism for returning information and results in the language. Make sure that the variable passed to the command contains the desired data or result before calling addResult().
RequestPost()
The RequestPost() command performs an HTTP POST request to a specified URL, sending a query string, headers, and a request body, and stores the result of the request in a destination variable. This command is useful for sending data to a server and handling the responses from the request.
Parameters
url
Type: variable
Description: The URL to which the POST request will be sent. It should be a variable containing the address of the resource to which the request is to be made.
querystring
Type: variable
Description: The query string that will be appended to the URL. It should be a variable containing the query parameters in string format.
headers
Type: variable
Description: The HTTP headers that will be included in the POST request. It should be a variable containing a dictionary of headers and their values.
body
Type: variable
Description: The body of the POST request that will be sent to the server. It should be a variable containing the data to be sent in the request.
o_result
Type: variable
Description: The variable in which the result of the POST request will be stored. It should be a variable that will receive the server's response.
Command Flow
Build the Request: Uses the provided URL, query string, headers, and body to construct the POST request.
Send the Request: Sends the POST request to the specified server.
Store the Result: Saves the server's response in the variable specified by o_result.
Example Usage
Suppose you want to send a POST request to https://api.example.com/data, with a query string userId=123, headers including Content-Type: application/json, and a body with JSON data.
// Define variables
url = "https://api.example.com/data"
querystring = "userId=123"
headers = {"Content-Type": "application/json"}
body = '{"name": "Alice", "age": 30}'
response = ''
// Call the command to perform the POST request
RequestPost(url, querystring, headers, body, response)
// Return the request result via addResult
addResult(response)
In this example, the RequestPost() command will send a POST request to https://api.example.com/data with the provided query string, headers, and body. The server's response will be stored in the response variable, and this variable will be returned via addResult(response). The result of the request will be included in the final response.
ormCreateTable()
The ormCreateTable() command creates a new table in a database using the specified ORM (Object-Relational Mapping). This command defines the columns of the table and their data types, and stores a reference to the created table in a destination variable.
Parameters
fields
Type: value
Description: A string containing the names of the table columns, separated by commas. Each column name should correspond to a field in the table.
fieldsType
Type: value
Description: A string containing the data types for each column, separated by commas. The data types should be in the same order as the column names in fields.
dbaseName
Type: value
Description: The name of the database where the table will be created. It should be a string indicating the target database.
varTarget
Type: variable
Description: The variable in which the reference to the created table will be stored. It should be a variable that will receive the reference to the new table.
Command Flow
Define the Table: Uses the column names (fields) and their data types (fieldsType) to define the structure of the new table.
Create the Table: Creates the table in the database specified by dbaseName using the provided definition.
Store the Result: Saves the reference to the created table in the variable specified by varTarget.
Example Usage
Suppose you want to create a table called users in a database called myDatabase, with two columns: username of type VARCHAR and age of type INTEGER.
// Define variables
fields = "username,age"
fieldsType = "VARCHAR,INTEGER"
dbaseName = "myDatabase"
tableReference = ''
// Call the command to create the table
ormCreateTable(fields, fieldsType, dbaseName, tableReference)
// Return the reference to the created table via addResult
addResult(tableReference)
In this example, the ormCreateTable() command will create a table in the myDatabase database with the specified columns and data types. The reference to the new table will be stored in the tableReference variable, and this variable will be returned via addResult(tableReference). The output will include the reference to the created table.
ormCheckTable()
The ormCheckTable() command checks for the existence of a table in a specific database and stores the result in a destination variable. This command is useful for verifying if a table already exists before attempting further operations on it.
Parameters
dbaseName
Type: value
Description: The name of the database in which the table's existence should be checked. It should be a string indicating the database to check.
varTarget
Type: variable
Description: The variable in which the result of the check will be stored. It should be a variable that will receive a value indicating whether the table exists or not.
Command Flow
Check Existence: Accesses the database specified by dbaseName to verify if the requested table exists.
Store the Result: Saves the result of the check in the variable specified by varTarget. The stored value will indicate whether the table exists (True or False).
Example Usage
Suppose you want to check if a table called users exists in a database called myDatabase.
// Define variables
dbaseName = "myDatabase"
tableExists = ''
// Call the command to check the existence of the table
ormCheckTable(dbaseName, tableExists)
// Return the result of the check via addResult
addResult(tableExists)
In this example, the ormCheckTable() command will check for the existence of the users table in the myDatabase database. The result of the check (whether the table exists or not) will be stored in the tableExists variable, and this variable will be returned via addResult(tableExists). The output will reflect whether the table exists (True) or not (False).
ormAccessUpdate()
The ormAccessUpdate() command updates records in a database table based on the provided selection criteria. This command modifies the values of specified fields in a database using the corresponding values from variables.
Parameters
fields
Type: variable
Description: A string containing the names of the fields to be updated. The field names should be separated by commas.
fieldsValuesVariables
Type: variable
Description: A string containing the names of the variables holding the new values for the specified fields. The variable names should be separated by commas, in the same order as the fields in fields.
dbase
Type: variable
Description: The name of the database where the table to be updated is located. It should be a variable containing the name of the database.
selector
Type: variable
Description: A condition to select the records to be updated. It should be a string specifying the selection criteria in SQL format, such as id = 1.
varTarget
Type: variable
Description: The variable in which the result of the update operation will be stored. It should be a variable that will receive a value indicating whether the update was successful or not.
Command Flow
Define Fields and Values: Uses the field names (fields) and the variables with the values to be updated (fieldsValuesVariables) to define which records should be modified and with what data.
Select Records: Uses the condition provided in selector to identify the records to be updated.
Update the Database: Performs the update in the database specified by dbase, applying the changes to the records that meet the selector condition.
Store the Result: Saves the result of the update operation in the variable specified by varTarget. The stored value will indicate whether the update was successful (True) or failed (False).
Example Usage
Suppose you want to update the age field to 31 for the user with id equal to 1 in a database called myDatabase.
// Define variables
fields = "age"
fieldsValuesVariables = "newAge"
dbase = "myDatabase"
selector = "id = 1"
updateSuccess = ''
// Define the variable holding the new value
newAge = 31
// Call the command to update the record
ormAccessUpdate(fields, fieldsValuesVariables, dbase, selector, updateSuccess)
// Return the result of the update via addResult
addResult(updateSuccess)
In this example, the ormAccessUpdate() command will update the age field in the myDatabase database for the record where id = 1. The new value for age is 31, stored in the newAge variable. The updateSuccess variable will store the result of the operation (whether it was successful or not), and this variable will be returned via addResult(updateSuccess).
ormAccessSelect()
The ormAccessSelect() command retrieves records from a table in a database based on the provided selection criteria. This command selects the desired fields and stores the results in a target variable.
Parameters
fields
Type: variable
Description: A string containing the names of the fields to be retrieved. The field names should be separated by commas.
dbase
Type: variable
Description: The name of the database from which records should be retrieved. It must be a variable containing the name of the database.
selector
Type: variable
Description: A condition to select the records to be retrieved. It must be a string specifying the selection criteria in SQL format, such as id = 1.
varTarget
Type: variable
Description: The variable in which the query results will be stored. It must be a variable that will receive a list of dictionaries, each representing a retrieved record.
Command Flow
Defining the Fields: Use the field names (fields) to specify which data should be retrieved.
Selecting Records: Use the condition provided in selector to identify which records should be selected from the database.
Retrieving Data: Access the database specified by dbase and retrieve the records that meet the selector condition, including only the specified fields.
Storing the Result: Save the query results in the variable specified by varTarget. The stored value will be a list of dictionaries, where each dictionary represents a retrieved record with the requested fields.
Example Usage
Suppose you want to retrieve the username field for all users where age is greater than 25 from a database called myDatabase.
// Define variables
fields = "username"
dbase = "myDatabase"
selector = "age > 25"
usersList = ''
// Call the command to retrieve the records
ormAccessSelect(fields, dbase, selector, usersList)
// Return the query results via addResult
addResult(usersList)
In this example, the ormAccessSelect() command will retrieve the username field for all users in the myDatabase database where age is greater than 25. The results will be stored in the usersList variable, and this variable will be returned via addResult(usersList). The output will be a list of dictionaries, each representing a user whose username has been retrieved.
ormAccessInsert()
The ormAccessInsert() command inserts a new record into a database table using the provided values for the fields. This command defines the fields and their corresponding values, and stores the result of the operation in a target variable.
Parameters
fields
Type: variable
Description: A string containing the names of the fields into which the values will be inserted. The field names should be separated by commas.
fieldsValuesVariables
Type: variable
Description: A string containing the names of the variables that hold the values to be inserted into the specified fields. The variable names should be separated by commas, in the same order as the fields in fields.
dbase
Type: variable
Description: The name of the database where the table into which the new record should be inserted is located. It must be a variable containing the name of the database.
varTarget
Type: variable
Description: The variable in which the result of the insertion operation will be stored. It must be a variable that will receive a value indicating whether the insertion was successful or not.
Command Flow
Defining the Fields and Values: Use the field names (fields) and the variables with the values to be inserted (fieldsValuesVariables) to define what data should be inserted.
Inserting into the Database: Perform the insertion of the new record into the database specified by dbase, using the provided values.
Storing the Result: Save the result of the insertion operation in the variable specified by varTarget. The stored value will indicate whether the insertion was successful (True) or failed (False).
Example Usage
Suppose you want to insert a new record into a table called users in a database called myDatabase, with values for username and age coming from the variables newUsername and newAge.
// Define variables
fields = "username,age"
fieldsValuesVariables = "newUsername,newAge"
dbase = "myDatabase"
insertSuccess = ''
// Define the variables with the new values
newUsername = "Alice"
newAge = 31
// Call the command to insert the new record
ormAccessInsert(fields, fieldsValuesVariables, dbase, insertSuccess)
// Return the result of the insertion via addResult
addResult(insertSuccess)
In this example, the ormAccessInsert() command will insert a new record into the myDatabase database in the users table. The values for username and age are provided by the newUsername and newAge variables. The insertSuccess variable will store the result of the operation (whether it was successful or not), and this variable will be returned via addResult(insertSuccess). The output will reflect whether the insertion was successful (True) or failed (False).
ormAI()
The ormAI() command uses an artificial intelligence model to convert a natural language query into an SQL statement, which is then executed against a database. This command processes a natural language query to generate an SQL statement that is executed on the table specified in the source parameter, and stores the result in a target variable.
Parameters
prompt
Type: variable
Description: A string in natural language that describes the query to be made. For example, "get the value of the row with id 5".
source
Type: variable
Description: The name of the table on which the generated query should be executed. It must be a variable containing the name of the table in the database.
TargetVariable
Type: variable
Description: The variable in which the result of the query will be stored. It must be a variable that will receive the result of the generated and executed SQL query.
Command Flow
Generating SQL Query: Use the artificial intelligence model to convert the prompt into an SQL statement. For example, if the prompt is "get the value of the row with id 5", the AI will generate the SQL query SELECT * FROM source WHERE id = 5;.
Executing the Query: Execute the generated SQL statement on the table specified in source.
Storing the Result: Save the result of the query execution in the variable specified by TargetVariable. The result will be the dataset retrieved by the executed SQL statement.
Example Usage
Suppose you want to retrieve all the data from the row with id equal to 5 from a table called users.
// Define variables
prompt = "get the value of the row with id 5"
source = "users"
queryResult = ''
// Call the command to process the query
ormAI(prompt, source, queryResult)
// Return the query result via addResult
addResult(queryResult)
In this example, the ormAI() command will convert the prompt into an SQL query: SELECT * FROM users WHERE id = 5;. This query will be executed on the users table, and the results will be stored in the queryResult variable. The queryResult variable will be returned via addResult(queryResult). The output will be the dataset retrieved by the executed SQL statement.
functionAI()
The functionAI() command uses an artificial intelligence model to convert a natural language description of a function or process into a code implementation, which is then executed and returns the result. This command converts a description provided in prompt into a function that operates on the data of the table specified in source, and stores the result in a target variable.
Parameters
prompt
Type: variable
Description: A string in natural language that describes the process or function to be executed. For example, "calculate the average of the salary column".
source
Type: variable
Description: The name of the table on which the generated function should be executed. It must be a variable containing the name of the table in the database.
TargetVariable
Type: variable
Description: The variable in which the result of the executed function or process will be stored. It must be a variable that will receive the result of the generated and executed code.
Command Flow
Generating Code: Use the artificial intelligence model to convert the prompt into a code implementation. For example, if the prompt is "calculate the average of the salary column", the AI will generate the code necessary to calculate the average of that column.
Executing the Code: Execute the generated code on the table specified in source.
Storing the Result: Save the result of the code execution in the variable specified by TargetVariable. The result will be the calculated value or the dataset produced by the executed code.
Example Usage
Suppose you want to calculate the average of the salary column in a table called employees.
// Define variables
prompt = "calculate the average of the salary column"
source = "employees"
averageSalary = ''
// Call the command to process the function
functionAI(prompt, source, averageSalary)
// Return the result of the function via addResult
addResult(averageSalary)
In this example, the functionAI() command will convert the prompt into a code implementation to calculate the average of the salary column in the employees table. The result of the calculation will be stored in the averageSalary variable, and this variable will be returned via addResult(averageSalary). The output will be the calculated average of the salary column.

View File

@ -1,140 +0,0 @@
SECTION I: Architecture, Memory, and Foundations
This section establishes the foundations of how AVAP manages service logic and in-memory data manipulation. Unlike conventional interpreted languages, AVAP uses a hybrid evaluation engine that enables the combination of declarative commands with dynamic expressions.
1.1 Endpoint Registration (registerEndpoint)
The registerEndpoint command is the atomic configuration unit. It acts as the bridge between the network layer (HTTP) and the application code.
Interface
registerEndpoint(path, method, middleware, description, handler, output)
Parameter Specification
path (String):
Defines the URL route. Supports static routes and is designed for future implementations of route parameters (variable segments).
method (String):
Specifies the allowed HTTP verb (GET, POST, PUT, DELETE). The server will automatically reject any request that does not match this method (Error 405).
middleware (List):
A list of functions executed sequentially before the handler. Ideal for JWT token validation or maintenance checks. If any middleware function fails, execution stops before reaching the main business logic.
description (String):
Metadata for automatic documentation generation (Swagger/OpenAPI). It does not affect execution but is critical in the development lifecycle.
handler (Function):
The logical entry point. This is the name of the main function where the business logic resides.
output (Variable):
Defines the “master” variable that the engine will automatically return at the end of execution, unless additional results are specified via addResult.
1.2 The Variable Assignment Engine (Dynamic Assignment)
AVAP allows direct assignment syntax using the = symbol, providing flexibility similar to languages such as Python, but under strict contextual control.
Internal Mechanics: The eval Process
When the interpreter encounters an instruction of the form variable = expression, it triggers a three-step process:
Cleanup and Tokenization:
The engine determines whether the expression contains references to existing variables (using $), method calls, or literals.
Expression Evaluation:
Operations are resolved in real time. This enables:
Boolean Logic:
is_valid = (age > 18 and has_permission == True)
Arithmetic:
tax = subtotal * 0.21
String Formatting:
query = "SELECT * FROM users WHERE id = %s" % retrieved_id
Object and Property Resolution:
Allows deep access to complex structures returned by database connectors or APIs:
customer_email = user_list[0].profile.email
Memory Impact
Unlike addVar, dynamic assignment can transform the variables type at runtime (Mutable Type System). If a variable originally contained a number and is later assigned a string after evaluation, the engine automatically updates the variables metadata.
1.3 State Initialization and References (addVar)
addVar is the fundamental command for defining the global script state.
Interface
addVar(targetVarName, varValue)
Advanced Behavior
Intelligent Automatic Typing:
The engine inspects varValue. If it detects a numeric format (even if provided as a string from configuration), it internally converts it to int or float. It supports both commas and periods interchangeably, normalizing the value for mathematical operations.
The $ Reference Prefix:
This is the dereferencing operator.
addVar(copy, $original)
Instructs the engine not to assign the literal string "$original", but instead to look up the current value of the variable original in the symbol table and copy it.
Scope:
Variables created with addVar in the main body of the script are considered Request Session Variables. This means they persist throughout the lifecycle of that specific API call execution, but remain isolated from other concurrent requests to ensure data safety (thread-safety).
Syntax Summary
Syntax Usage Description
name = "John" Direct Assignment Creates a simple string.
total = $price * 1.10 Dynamic Evaluation Uses the value of price in a calculation and stores the result.
addVar(status, 200) Initialization Explicit method to ensure creation in the global context.
data = res[0].info Object Access Extracts a specific property from a JSON object or DB result.
Examples
1. Hello World
(State Initialization Section 1.3)
addVar(message, "Hello world from AVAP")
addResult(message)
2. Mathematical Assignment
(Dynamic Assignment & Arithmetic Evaluation Section 1.2)
subtotal = 150.50
tax = subtotal * 0.21
total = subtotal + tax
addResult(total)
3. Dynamic String Concatenation
(Expression Evaluation String Formatting Section 1.2)
name = "System"
log = "Event registered by: %s" % name
addResult(log)
4. Value Reference ($)
(State Initialization & Dereferencing Section 1.3)
addVar(base, 1000)
addVar(copy, $base) // copy takes the value 1000, not the string "$base"
addResult(copy)
5. Boolean Assignment
(Boolean Expression Evaluation Section 1.2)
level = 5
is_admin = level >= 10
addResult(is_admin) // Returns False
6. Multiple Response (JSON Construction)
(Global State Initialization Section 1.3)
addVar(code, 200)
addVar(status, "Success")
addResult(code)
addResult(status)
// Result: {"code": 200, "status": "Success"}

View File

@ -1,151 +0,0 @@
SECTION II: Input and Output (I/O) Management
This section describes the mechanisms AVAP uses for external data ingestion, parameter integrity validation, and construction of the response payload delivered to the final client.
2.1 Intelligent Parameter Capture (addParam)
The addParam command is responsible for extracting information from the incoming HTTP request. Its design is source-agnostic, simplifying development by not requiring the programmer to specify where the data originates.
Interface
addParam(param_name, target_variable)
Priority Mechanism (Cascading Search)
When addParam is invoked, the AVAP engine inspects the request in the following hierarchical order:
Query Arguments:
Parameters present in the URL (e.g., ?id=123).
JSON Body:
If the request includes Content-Type: application/json, the engine searches for the key inside the JSON object.
Form Data / Body Arguments:
Data submitted via standard forms (x-www-form-urlencoded).
Technical Behavior
Automatic Decoding:
The engine attempts to decode values into ASCII/UTF-8 format, eliminating encoding inconsistencies.
Null Handling:
If the requested parameter does not exist in any source, the target variable is initialized as None. This enables subsequent security checks using if blocks.
2.2 Collection Validation and Counting (getListLen)
To ensure API robustness, it is necessary to validate the volume of received information. getListLen acts as AVAPs volume inspector.
Interface
getListLen(source_variable, target_variable)
I/O Applications
Parameter Validation:
Counts how many elements are contained in a variable populated by addParam or getQueryParamList.
Loop Safety:
Before initiating a startLoop, it is recommended to use getListLen to define the upper bound of the iteration, preventing overflow errors.
Database Results:
After a query, determines whether records were retrieved (length > 0) or if the result set is empty.
2.3 Multiple List Capture (getQueryParamList)
There are scenarios where the same parameter is sent multiple times (e.g., search filters such as ?color=red&color=blue). AVAP manages this through specialized list-based capture.
Interface
getQueryParamList(param_name, target_list_variable)
Effect
Transforms all occurrences of param_name into a structured list within target_list_variable. If only one value is present, a single-element list is created. This ensures downstream logic can always treat the data as a collection, preventing type errors.
2.4 Response Construction (addResult)
The addResult command registers which variables will be included in the response body. AVAP dynamically constructs an output JSON object based on calls to this command.
Interface
addResult(source_variable)
Advanced Features
Promise Handling:
If the variable passed to addResult is the result of an operation initiated with go_async, the engine automatically marks that field as "promised" in the response or returns the thread ID if synchronization has not completed.
String Cleanup:
The engine detects redundant quotation marks (resulting from prior evaluations) and normalizes them to ensure the resulting JSON is valid and clean.
Multi-Registration:
Multiple calls to addResult are allowed. Each call adds a new key to the output JSON object. By default, the JSON key matches the variable name, unless a custom key is defined in the engine.
2.5 HTTP Status Control (_status)
AVAP uses a reserved system variable to communicate with the underlying web server and define the success or error code of the transaction.
Using _status
By assigning a numeric value to the _status variable (using addVar or direct assignment), the programmer defines the HTTP response code.
Code Common Usage in AVAP
200 Successful operation (default value).
201 Resource successfully created.
400 Parameter validation error (Bad Request).
401 Authentication failure.
500 Internal error caught inside an exception block.
Integrated Example of Section II
// Capture input
addParam("user_id", id)
// Validate presence
if(id, None, '==')
addVar(_status, 400)
addVar(error, "User ID is required")
addResult(error)
return()
end()
// If execution reaches here, respond with success
addVar(_status, 200)
addResult(id)
Examples
1. ID Capture
(Input Ingestion Section 2.1)
addParam("client_id", internal_id)
// If the client sends ?client_id=500, internal_id will be 500
addResult(internal_id)
2. Parameter Counter
(Collection Validation Section 2.2)
addParam("data_list", my_list)
getListLen(my_list, count)
addResult(count)
3. Null Validation
(Input Validation & HTTP Status Control Sections 2.1 & 2.5)
addParam("api_key", key)
if(key, None, "==")
addVar(_status, 403)
addVar(error, "Access denied: missing API KEY")
addResult(error)
end()
4. Multiple List Capture
(Multi-Value Query Capture Section 2.3)
getQueryParamList("emails", email_list)
addResult(email_list)
5. Multiple Response (JSON Construction)
(Response Construction Section 2.4)
addVar(code, 200)
addVar(status, "Success")
addResult(code)
addResult(status)
// Result: {"code": 200, "status": "Success"}

View File

@ -1,192 +0,0 @@
SECTION III: Control Logic and Decision Structures
This section details how AVAP manages execution flow. The language uses explicitly closed block structures that enable clear sequential reading, simplifying the debugging of complex APIs.
3.1 The Conditional Block (if / else / end)
The if structure in AVAP is a versatile tool that allows atomic comparisons or the evaluation of complex logical expressions processed by the dynamic evaluation engine.
Standard Interface
if(variable_A, value_B, operator)
Available Operators
Operator Description Example
= Strict equality (or numeric equivalence). if(role, "admin", "=")
!= Inequality. if(status, 200, "!=")
> / < Numeric magnitude comparison. if(age, 18, ">")
in Checks whether an element belongs to a list or string. if(user, blacklist, "in")
Complex Expression Evaluation
AVAP allows omission of comparison parameters to evaluate a complete logical expression directly in the third parameter.
Example:
if(None, None, "age >= 18 and balance > 100")
Block Closure Structure
An if block may include an optional else() block and must always terminate with the end() command.
3.2 Iterations and Loops (startLoop / endLoop)
For collection processing (such as database rows or parameter lists), AVAP implements an index-controlled loop structure.
Interface
startLoop(counter, start, end)
Execution Mechanics
Initialization:
The engine creates the counter variable with the start value.
Increment:
On each iteration, the counter automatically increases by 1.
Exit Condition:
The loop terminates when the counter exceeds the end value.
Practical Example: List Processing
// Retrieve the length of a list captured in Section II
getListLen(received_items, total)
startLoop(i, 0, total)
current_item = received_items[i]
// Processing logic for each item...
endLoop()
3.3 Error Handling and Robustness (try / exception)
AVAP is designed for production environments where external failures (database timeouts, third-party API outages) are expected realities. The try block allows capturing such events without stopping the server.
Interface
try ... exception(error_variable) ... end()
Technical Operation
try Block:
The engine attempts to execute the contained instructions. If a critical failure occurs, execution of that block stops immediately.
exception Block:
If an error is detected, control passes to this block. The error_variable is automatically populated with a string describing the failure (simplified stack trace).
end() Block:
Closes the structure and allows the script to continue normal execution after error handling.
Connector Safety Example
try
// Attempt a query to an external connector (Section V)
result = db.query("SELECT * FROM payments")
exception(failure_detail)
// If it fails, log the error and notify
addVar(_status, 500)
addVar(message, "Persistence error: %s" % failure_detail)
addResult(message)
end()
3.4 Early Exit Control (return)
The return() command is a control instruction that immediately terminates execution of the current context (either a function or the main script).
If used inside a function, it returns control (and optionally a value) to the caller.
If used in the main flow, it terminates API execution and triggers automatic delivery of the JSON response constructed up to that point.
Examples
1. Simple Comparison
Code snippet
addParam("lang", l)
if(l, "es", "=")
addVar(msg, "Hello")
end()
addResult(msg)
2. Standard Else
Code snippet
if(balance, 0, ">")
allow = True
else()
allow = False
end()
addResult(allow)
3. Complex Expression (Dynamic Evaluation)
Code snippet
if(None, None, "user_type == 'VIP' or purchases > 100")
addVar(discount, 0.20)
end()
addResult(discount)
4. Loop from 1 to 10 (ID Generation)
Code snippet
startLoop(i, 1, 10)
AddvariableToJSON("item_${i}", "generated_value", my_json)
endLoop()
addResult(my_json)
5. HTTP Request Try-Catch
Code snippet
try
RequestGet("https://api.test.com/data", 0, 0, response)
exception(e)
addVar(error_trace, "Connection failure: %s" % e)
addResult(error_trace)
end()
6. Proper Loop Exit (Using Control Variable)
Code snippet
found = False
startLoop(i, 1, 10)
if(i, 5, "==")
found = True
// In AVAP, to exit you can force the index beyond the limit
i = 11
end()
endLoop()
addResult(found)
7. 'in' Validation (Membership Check)
Code snippet
addParam("role", r)
if(r, ["admin", "editor", "root"], "in")
access = True
end()
addResult(access)
8. Loop Over Data Length
Code snippet
getListLen(records, total)
startLoop(idx, 0, total)
current = records[idx]
// processing logic
endLoop()
9. Inequality If
Code snippet
if(new_password, old_password, "!=")
addVar(change, "Password updated")
end()
addResult(change)
10. Critical SQL Error Handling
Code snippet
try
ormDirect("UPDATE nonexistent_table SET a=1", res)
exception(e)
addVar(_status, 500)
addResult("Database error")
end()

View File

@ -1,48 +0,0 @@
Introduction
Discovering a New Programming Language
Welcome to the AVAP book, where you will delve into the fascinating world of an innovative and powerful programming language: AVAP™. In these pages, we will explore together the fundamental concepts, syntax, and unique features of AVAP™, and prepare you to master this new language and harness its full potential in your software development projects.
Discovering AVAP
AVAP™ is much more than just a programming language; it is a versatile tool designed to enhance creativity and efficiency in software development. With its clear and expressive syntax, AVAP™ allows developers to write code more quickly and concisely, without sacrificing the power and flexibility needed to create robust and scalable applications.
What Makes AVAP Special?
AVAP™ stands out due to several distinctive features that make it unique in the programming world:
Integrated Virtualization: AVAP™ is designed from the ground up with the concept of virtualization in mind. Every aspect of the language is optimized to work in virtual environments, allowing developers to create immersive and scalable experiences.
Powerful APIs: AVAP™ provides a comprehensive set of tools for interacting with external APIs and web services, making it easier to integrate advanced functionalities into your applications.
Enhanced Productivity: With an intuitive syntax and advanced abstraction features, AVAP™ allows you to write less code to achieve more, thereby increasing your productivity and accelerating development time.
What Will You Find in This Book?
In this book, we will guide you through the basic and advanced concepts of AVAP™, providing practical examples, useful tips, and challenging exercises to help you master the language and become an expert AVAP™ developer. From installing and configuring the development environment to creating complete applications, this book will accompany you every step of the way towards mastering AVAP™.
Are You Ready to Get Started?
Then lets not wait any longer! Dive into the pages of this book and get ready to embark on an exciting journey towards mastering AVAP™. Whether you are an experienced programmer looking for new tools or a curious beginner in the world of programming, this book has something for you. Lets explore the fascinating world of AVAP™ together!
The Virtuality Attribute in AVAP™
AVAP™ (Advance Virtual API Programming) is a dynamic programming language distinguished by its virtuality attribute, which enables the development of virtual APIs in a dynamic and flexible manner. This attribute is based on the fact that the language specifications do not reside in the language interpreter, allowing the final code to be constructed in real-time by the language server.
1.1 Virtuality Principle in AVAP
The principle of virtuality in AVAP™ is based on several key aspects:
1.1.1 Language Specifications Decoupled from the Interpreter
In AVAP™, language specifications are not compiled into the core of the language nor do they reside in the interpreter. This means that the interpreter is not tied to a specific implementation of the language, providing great flexibility and adaptability in code interpretation.
1.1.2 Dynamic Code Construction in Real-Time
Thanks to the virtuality attribute, AVAP™ allows for dynamic code construction in real-time. This means that the final code to be interpreted by the language server can vary and mutate according to current needs, without the need for recompilation or redistribution.
1.1.3 Development of Dynamic Virtual APIs
The virtuality attribute in AVAP™ enables the development of virtual APIs in a dynamic manner. This allows APIs to evolve, improve, and adapt to new security or functional needs in real-time, without affecting the clients utilizing the API endpoint.
1.2 Benefits of the Virtuality Attribute
Flexibility: The ability to construct code in real-time provides significant flexibility in API development and management.
Agility: The capacity to adapt and evolve without the need for precompilation or distributed updates allows for greater agility in software development.
Simplified Maintenance: The development of dynamic virtual APIs simplifies the maintenance process, as changes do not need to be made to clients consuming those APIs.
1.3 Interaction with Artificial Intelligence
One of the most innovative features of this language is its integration with artificial intelligence through OpenAI. This integration allows the language to automatically generate the necessary results through an interface with OpenAI once the programmer has a clear solution to a problem. This functionality not only speeds up development but also reduces the margin of error and improves efficiency.
1.4 Access to Databases
The language also includes the capability to interact with databases using natural language, supported by artificial intelligence, currently version XXXXX through OpenAI. This feature allows for complex queries and data manipulation without deep knowledge of SQL, simplifying development and improving accessibility for programmers of all levels.
With this guide, we hope to provide you with all the necessary information to make the most of this dynamic language's capabilities. From variable management to automated result generation and simplified database access, this language is designed to transform the way you develop APIs.
1.5 Conclusions
The virtuality attribute in AVAP™ represents an innovative approach to virtual API development, allowing for greater flexibility, agility, and simplification in the software development and maintenance process. By decoupling language specifications from the interpreter and enabling dynamic code construction in real-time, AVAP™ offers a new paradigm in API design and management.

View File

@ -1,179 +0,0 @@
SECTION IV: Concurrency and Asynchrony
AVAP implements a thread-based concurrency model that enables fire-and-forget execution or parallel execution with later synchronization. This is essential for tasks such as email dispatching, log processing, or querying multiple external APIs simultaneously.
4.1 Launching Background Processes (go_async)
The go_async command extracts a block of code from the main sequential flow and places it into a parallel execution queue.
Interface
go_async(thread_id)
Execution Mechanics
Identification:
The programmer assigns a thread_id (a string or variable) to reference the process later.
Forking:
When invoked, the AVAP engine creates a new native thread. The main flow immediately continues to the next instruction after the go_async block.
Context Isolation:
The asynchronous thread inherits a snapshot copy of the variable state at the moment of invocation, allowing it to operate safely without interfering with the main thread.
Example: Immediate Response with Long-Running Process
addParam("email", destination)
go_async("email_dispatch")
// This block takes 5 seconds, but the API does not wait
mail_service.send(destination, "Welcome to AVAP")
end()
addVar(msg, "Your email is being processed in the background")
addResult(msg)
// The client receives the response in milliseconds
4.2 Result Synchronization (gather)
When the main flow requires data generated by an asynchronous thread to proceed, the gather synchronization mechanism is used.
Interface
gather(thread_id, timeout)
Specifications
thread_id:
The identifier used in the go_async command.
timeout (Seconds):
Maximum time the main thread will wait. If the asynchronous thread does not finish within this period, AVAP raises an exception that can be caught (see Section III).
Technical Behavior
Controlled Blocking:
The main thread is suspended until the specified thread_id completes.
State Recovery:
Once synchronized, any variables modified within the asynchronous thread are merged back into the main threads context.
4.3 Optimized Parallel Execution (Fan-Out Pattern)
AVAP allows launching multiple threads and then waiting for all of them, reducing total execution time to the duration of the slowest thread instead of the sum of all execution times.
Example: Querying Multiple Databases
go_async("db_north")
data_north = north_connector.query("SELECT...")
end()
go_async("db_south")
data_south = south_connector.query("SELECT...")
end()
// Wait for both (Maximum 10 seconds)
gather("db_north", 10)
gather("db_south", 10)
// Combine results using Section I mechanisms
total_result = data_north + data_south
addResult(total_result)
4.4 Promise State in Output
As mentioned in Section II, if a variable that is still being processed in an asynchronous thread is passed to addResult, AVAP manages the response intelligently:
If the thread is still running:
The output JSON will display "variable": "promised" or the thread ID.
If the thread failed:
The error is logged internally and the variable is set to None.
If gather was used before addResult:
The fully processed real value is returned in the response.
Examples
1. Background Process Fire-and-Forget
Code snippet
go_async("notification_thread")
// This block runs on its own
RequestPost("https://hooks.slack.com/...", {}, body, r)
end()
2. Synchronization (Gather)
Code snippet
go_async("heavy_calc")
result = (500 * 250) / 3
end()
gather("heavy_calc", 10) // Waits until it finishes
addResult(result)
3. Connector Parallelism
Code snippet
go_async("db_1")
res1 = db1.query("SELECT...")
end()
go_async("db_2")
res2 = db2.query("SELECT...")
end()
4. Gather with Safety Timeout
Code snippet
gather("external_process", 5)
// If it doesn't finish in 5s, the script continues or fails depending on config
5. Using "Promised" State
Code snippet
go_async("long_task")
final = "Finished"
end()
addResult(final) // You will see "promised" in the JSON
6. Validation After Gather
Code snippet
gather("data_thread", 2)
if(thread_data, None, "!=")
addResult(thread_data)
end()
7. Multi-Threading for Auditing
Code snippet
go_async("audit_thread")
ormDirect("INSERT INTO audit...", r)
end()
8. Dynamic Gather in a Loop
Code snippet
startLoop(h, 1, 3)
gather("thread_${h}", 1)
endLoop()
9. Asynchronous Tax Calculation
Code snippet
go_async("tax_engine")
total_tax = total * 0.21
end()
10. Try-Except Inside an Async Thread
Code snippet
go_async("safe_thread")
try
// Task that may fail
exception(err)
// Log the error locally
end()
end()

View File

@ -1,175 +0,0 @@
SECTION V: Persistence, Connectors, and Native ORM
AVAP is designed to be database-agnostic. It enables data manipulation through three layers: the universal connector, simplified ORM commands, and direct SQL execution.
5.1 The Universal Connector (avapConnector)
The avapConnector command is the entry point for any external integration. It uses a Connection Token system (Base64) that encapsulates configuration details (host, port, credentials, driver) to keep code clean and secure.
Interface
connector_variable = avapConnector("BASE64_TOKEN")
Connector Object Capabilities
Once instantiated, the variable behaves as an object with dynamic methods:
Database Connectors:
Expose the .query(sql_string) method, which returns objects or lists depending on the result set.
API Connectors (Twilio, Slack, etc.):
Expose native service methods (e.g., .send_sms()).
Example: Dynamic Assignment with Connectors
// Instantiate the connection
db = avapConnector("REJfQ09OTkVDVE9SM...")
// Execute query and use Section I dynamic evaluation
users = db.query("SELECT * FROM users")
first_admin = users[0].name if users[0].role == 'admin' else 'N/A'
addResult(first_admin)
5.2 Native ORM Layer (ormCheckTable / ormDirect)
For quick operations on the local or default database cluster, AVAP provides system-level commands that do not require prior instantiation.
5.2.1 ormCheckTable
Verifies the existence of a database structure. It is critical for installation scripts or automated migrations.
Interface:
ormCheckTable(table_name, target_var)
Response:
target_var receives the string values "True" or "False".
5.2.2 ormDirect
Executes SQL statements directly. Unlike .query(), it is optimized for statements that do not necessarily return rows (such as INSERT, UPDATE, or CREATE TABLE).
Interface:
ormDirect(statement, target_var)
Interpolation Usage Example:
ormDirect("UPDATE users SET login = '%s' WHERE id = %s" % (now, id), result)
5.3 Data Access Abstraction (Implicit Commands)
AVAP includes specialized commands for common CRUD operations, reducing the need to write manual SQL and mitigating injection risks.
ormAccessSelect
Performs filtered queries returning a list-of-objects structure.
Syntax:
ormAccessSelect(table, filters, target)
ormAccessInsert / ormAccessUpdate
Manages data persistence.
If used on an object that already has an ID, Update synchronizes changes; otherwise, Insert creates the record.
5.4 Dynamic Query Formatting (Injection Prevention)
As detailed in Section I, the AVAP engine processes SQL strings before sending them to the database engine. The official recommendation is to always use interpolation with the % operator to ensure proper handling of data types (Strings vs Integers) by the driver.
Recommended Secure Pattern
sql = "SELECT * FROM %s WHERE status = '%s'" % (table_name, recovered_status)
res = db.query(sql)
5.5 Cryptographic Security Integration (encodeSHA256)
Within the persistence flow, AVAP provides native tools to secure sensitive data before it is written to disk.
Interface
encodeSHA256(source_text, target_variable)
Complete Registration Flow (Final Example)
This example integrates Sections I, II, III, and V:
// II: Input capture
addParam("pass", p)
addParam("user", u)
// I & V: Processing and security
encodeSHA256(p, secure_pass)
// V: Insertion
sql = "INSERT INTO users (username, password) VALUES ('%s', '%s')" % (u, secure_pass)
ormDirect(sql, db_result)
// III & II: Response
if(db_result, "Success", "=")
addVar(msg, "User created")
addResult(msg)
end()
Examples
1. Connector Instantiation
Code snippet
my_db = avapConnector("VE9LRU5fREVCX0RFU0FSUk9MTE8=")
2. Record Retrieval
Code snippet
rows = my_db.query("SELECT id, name FROM users")
addResult(rows)
3. Direct Command Execution
Code snippet
ormDirect("TRUNCATE TABLE temp_cache", status)
4. Structure Verification
Code snippet
ormCheckTable("inventory", exists)
if(exists, "False", "==")
ormDirect("CREATE TABLE inventory...", r)
end()
5. Secure Update (Interpolation)
Code snippet
sql = "UPDATE users SET login_count = %s WHERE email = '%s'" % (count, email)
ormDirect(sql, res)
6. JSON/DB Object Navigation
Code snippet
found_id = query_result[0].id
addResult(found_id)
7. ORM Select with Filter
Code snippet
ormAccessSelect("orders", {"status": "pending"}, list_result)
addResult(list_result)
8. Processing Database Results
Code snippet
records = db.query("SELECT...")
startLoop(i, 0, len(records))
name = records[i].name
endLoop()
9. Cryptographic Persistence
Code snippet
encodeSHA256(password_raw, hashed)
ormDirect("INSERT INTO logins (hash) VALUES ('%s')" % hashed, r)
10. Third-Party Connector (e.g., Slack)
Code snippet
slack_api = avapConnector("U0xBQ0tfQVBJX1RPS0VO")

View File

@ -1,157 +0,0 @@
SECTION VI: System Utilities and Transformation
This section documents the native commands for advanced string manipulation, precise time handling, and dynamic data generation.
6.1 Time and Date Management (getDateTime / stampToDatetime)
AVAP handles time in two formats:
Epoch/Timestamp (numeric): Ideal for calculations.
Formatted Datetime (string): Ideal for human readability and database storage.
6.1.1 getDateTime
Generates the current time with high precision.
Interface:
getDateTime(format, timeDelta, timeZone, targetVar)
Parameters
format:
Example: "%Y-%m-%d %H:%M:%S".
If left empty, returns the current Epoch timestamp.
timeDelta:
Seconds to add (positive) or subtract (negative).
Particularly useful for calculating token expiration times.
timeZone:
Time zone region (e.g., "Europe/Madrid").
6.1.2 stampToDatetime
Converts a numeric value (Unix Timestamp) into a human-readable string.
Interface:
stampToDatetime(timestamp, format, offset, targetVar)
Common Use Case:
Formatting dates retrieved from the database (Section V) before sending them to the client (Section II).
6.2 Advanced String Manipulation (replace / randomString)
6.2.1 replace
Allows text cleaning and transformation. Essential when receiving client data that requires sanitization.
Interface:
replace(sourceText, oldText, newText, targetVar)
Example Use Case:
Removing spaces or unwanted characters from a username before executing a SQL query.
6.2.2 randomString
Generates secure random alphanumeric strings.
Interface:
randomString(length, targetVar)
Applications:
Temporary password generation
Session ID creation
Unique file name generation
6.3 Security and Hash Operations (encodeSHA256)
Although previously mentioned in the persistence section, this is fundamentally a data transformation utility.
Mechanics
Deterministic one-way function.
AVAP uses an optimized implementation ensuring that the same input always produces the same hash.
This enables secure login comparisons without storing or exposing the actual password.
6.4 The Return Command (return)
Within functions and execution flows, return not only stops execution but can also inject the result of a subroutine back into the main flow.
Complete Utility Flow Example
// 1. Generate a temporary token
randomString(16, token_raw)
// 2. Calculate expiration (within 1 hour = 3600 seconds)
getDateTime("%Y-%m-%d %H:%M:%S", 3600, "UTC", expiration_date)
// 3. Format a system message using Section I
message = "Your token %s expires on %s" % (token_raw, expiration_date)
// 4. Send to client (Section II)
addResult(message)
6.5 Common Format Tokens (Cheat Sheet)
Token Description Example
%Y Full year 2026
%m Month (0112) 02
%d Day (0131) 23
%H Hour (0023) 21
%M Minute (0059) 45
Examples
1. Unix Timestamp Retrieval
Code snippet
getDateTime("", 0, "UTC", now)
addResult(now)
2. Database-Formatted Date
Code snippet
getDateTime("%Y-%m-%d %H:%M:%S", 0, "Europe/Madrid", sql_date)
addResult(sql_date)
3. Expiration Calculation (1 Day)
Code snippet
getDateTime("", 86400, "UTC", expires_at)
addResult(expires_at)
4. Timestamp to Readable Conversion
Code snippet
stampToDatetime(1708726162, "%d/%m/%Y", 0, human_date)
addResult(human_date)
5. String Cleaning (Replace)
Code snippet
replace("REF_1234_OLD", "OLD", "NEW", updated_ref)
addResult(updated_ref)
6. Random Token Generator
Code snippet
randomString(32, security_token)
addResult(security_token)
7. SHA256 Hash for Integrity
Code snippet
encodeSHA256("payload_data", checksum)
addResult(checksum)

View File

@ -1,98 +0,0 @@
SECTION VII: Function Architecture and Scopes
This section explains how to encapsulate reusable logic and how AVAP manages isolated memory to prevent side effects across different parts of the program.
7.1 Definition and Declaration (function)
A function in AVAP is an independent block of code registered in the engine so it can be invoked at any time.
Interface
function function_name(argument1, argument2, ...){ ... }
Technical Characteristics
Local Scope (function_local_vars):
When entering a function, AVAP creates a new local variable dictionary. Variables created inside the function (e.g., temp = 10) do not exist outside of it, protecting the global state.
Context Inheritance:
Functions can read global variables using the $ prefix, but any new assignment (=) remains in the local scope unless an explicit global persistence command is used.
7.2 The Return Command (return)
This is the mechanism used to terminate function execution and optionally send a value back to the caller.
Interface
return(variable_or_value)
Behavior
Termination:
Immediately stops function processing.
Data Transfer:
The value passed to return is injected into the variable that invoked the function in the main flow.
Cleanup:
Once return executes, the functions local variable dictionary is destroyed to free memory.
7.3 Invocation and Parameter Passing
Functions are called by name followed by the required values or variables.
Professional Implementation Example
// Function definition (Local Scope)
function calculate_discount(base_price, percentage){
factor = percentage / 100
discount = base_price * factor
total = base_price - discount
return(total)
}
// Main Flow (Global Scope)
addVar(pvp, 150)
// Function call passing a reference $ and a literal value
final_price = calculate_discount($pvp, 20)
addResult(final_price) // Result: 120
7.4 Functions as Middlewares
In the registerEndpoint command (Section I), the middleware parameter accepts a list of functions. These functions have special behavior:
If a middleware executes a return() without a value or with an error value, AVAP can be configured to abort the request before reaching the main handler.
Ideal for guard tasks such as:
API key verification
Data schema validation
Initial audit logging
7.5 Recursion and Limits
AVAP supports recursion (a function calling itself), but caution is recommended regarding stack depth, especially in asynchronous processes (Section IV).
For processing large volumes of data, it is always preferable to use startLoop (Section III) instead of deep recursive calls.
Examples
1. Modular Sum Function
Code snippet
function sum(a, b){
total = a + b
return(total)
}
result = sum(10, 20)
2. Access Validation Function
Code snippet
function is_valid(token){
if(token, "SECRET", "==")
return(True)
end()
return(False)
}
authorized = is_valid($input_token)

View File

@ -1,33 +0,0 @@
Master Example (Combining Sections)
This example shows how a real flow uses almost all sections:
// SECTION I & II: Registration and Input
registerEndpoint("/v1/user", "POST", [], "Create User", main, final_res)
function main(){
addParam("user", u)
addParam("pass", p)
// SECTION III & VI: Validation and Security
if(u, None, "==")
addVar(_status, 400)
return("User is required")
end()
encodeSHA256(p, pass_hash)
// SECTION IV: Asynchrony (Audit log)
go_async("audit")
ormDirect("INSERT INTO audit (event) VALUES ('User creation attempt')", r)
end()
// SECTION V: Persistence
db = avapConnector("TOKEN_DB")
res_db = db.query("INSERT INTO users (name, pass) VALUES ('%s', '%s')" % (u, pass_hash))
// SECTION II: Response
addVar(_status, 201)
addResult(res_db)
return(res_db)
}

View File

@ -1,50 +0,0 @@
Chapter 1: Dynamic Programming Language
In this chapter, we will introduce AVAP™ as a dynamic programming language. A dynamic language is one whose behavior can be modified during the runtime of the program. AVAP™ shares many characteristics with other dynamic languages, making it a powerful and versatile tool for application development.
1.1 Features of AVAP™ as a Dynamic Language
AVAP™ is characterized by its dynamic nature, which means it offers various features that allow flexibility and adaptability in program development. Below, we will detail some of these features:
1.1.1 Dynamic Typing
In AVAP™, variable typing is dynamic, which means it is not necessary to explicitly declare the type of a variable before assigning it a value. This allows greater flexibility in data handling and simplifies code writing.
# Example of dynamic typing
x = 10 # x is an integer
x = "Hello" # x is now a string
1.1.2 Automatic Memory Management
AVAP™ uses an automatic garbage collector to manage memory dynamically. This means that developers do not have to worry about manually allocating and freeing memory, which simplifies the development process and reduces the likelihood of memory management-related errors.
# Example of automatic memory management:
list = [1, 2, 3, 4, 5]
# There is no need to free the memory of the list after use
1.1.3 Runtime Interpreter: Dynamic Code Construction
AVAP™ uses a runtime interpreter that goes beyond simply executing code line by line. Instead, the AVAP™ runtime interpreter is characterized by its ability to dynamically construct code during runtime, adding an element of virtuality to the execution process.
Dynamic code construction means that the AVAP™ runtime interpreter can generate and modify code as the program executes. This allows for greater flexibility and adaptability in data manipulation and operation execution.
A fundamental aspect of virtuality in dynamic code construction is that the language specifications are completely isolated from the runtime interpreter. This means that the interpreter is not tied to a specific language implementation, facilitating code portability and allowing for the transparent integration of new features and functionalities.
In summary, the AVAP™ runtime interpreter not only executes code line by line but also dynamically constructs code during runtime, adding an additional level of virtuality and flexibility to the program execution process.
1.1.4 Flexibility in Programming
AVAP™ offers a wide range of features that promote flexibility in programming. This includes support for higher-order functions, dynamic exception handling, and the ability to manipulate objects at runtime, among others.
# Example of a higher-order function
function operation(func, a, b){
return(func(a, b))
}
function add(x, y){
return(x + y)
}
result = operation(add, 3, 5)
# The add function is passed as an argument
1.2 Advantages of AVAP™ as a Dynamic Language
As a dynamic programming language, AVAP™ offers several advantages, including:
Greater flexibility and adaptability in program development.
Faster writing and execution of code.
Facilitates experimentation and exploration of solutions.
Allows for rapid feedback during development.
1.3 Summary
AVAP™ is a dynamic programming language that offers a wide range of features promoting flexibility, adaptability, and speed in application development. With its dynamic typing, automatic memory management, runtime interpreter, and programming flexibility, AVAP™ becomes a powerful and versatile tool for developers.

View File

@ -1,79 +0,0 @@
Chapter 2: Notation in AVAP™
Introduction
Notation in AVAP™ refers to the conventions and rules used to write and format code in the AVAP™ programming language. Notation is essential to ensure code readability and comprehension, as well as to establish a coherent and consistent syntax across all projects.
General Conventions
In AVAP™, several general notation conventions are followed, similar to those used in other programming languages like Python. Some of these conventions include:
Indentation: Code is structured through indentation, using white spaces or tabs to indicate the hierarchy and structure of the code. It is recommended to use four spaces for each level of indentation.
Case Sensitivity: AVAP™ is case-sensitive, meaning that identifiers, variable names, and keywords must be consistently written using the same capitalization format throughout the code.
Comments: Comments are used to document the code and explain its functionality. Single-line comments begin with the // symbol, while multi-line comments start with /* and end with */.
Specific Notation Rules
In addition to general conventions, AVAP™ follows specific notation rules for different elements of the language, including:
Variables: Variable names should be descriptive and meaningful, using lowercase letters and underscores to separate words if necessary for readability (e.g., variable_name).
Functions: Function names should follow the same conventions as variables, with the addition of parentheses to indicate function parameters (e.g., function_name(parameter1, parameter2)).
Constants: Constants are typically written in uppercase letters with underscores separating words (e.g., EXAMPLE_CONSTANT).
The descriptions of lexical analysis and syntax use a modified BackusNaur form (BNF) grammar notation. This uses the following style of definition:
<program> ::= <statement_list>
<statement_list> ::= <statement> | <statement> <statement_list>
<statement> ::= <global_assignment> | <local_assignment> | <command>
<global_assignment> ::= "addVar(" <string_value> "," <variable_name> ")"
<local_assignment> ::= <variable_name> "=" <value>
<string_value> ::= """ <string_content> """
<string_content> ::= <string_part> | <string_part> <string_content>
<string_part> ::= <text> | <variable_reference>
<text> ::= <character> | <character> <text>
<variable_reference> ::= " <variable_name> "
<variable_name> ::= <letter> | <letter> <variable_name>
<value> ::= <string_value> | <number> | <expression>
<number> ::= <digit> | <digit> <number>
<expression> ::= <value> | <value> <operator> <value>
<operator> ::= "+" | "-" | "*" | "/"
<command> ::= <any_valid_command_syntax>
<character> ::= any character except `" ` and `\`
<letter> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" | "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" | "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "_"
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
Explanation:
<program>: A program is a list of statements.
<statement_list>: A list of statements can be a single statement or a statement followed by another list of statements.
<statement>: A statement can be a global assignment, a local assignment, or a command.
<global_assignment>: A global assignment follows the format addVar('value', variable_name).
<local_assignment>: A local assignment follows the Python syntax variable_name = value.
<string_value>: A string value is enclosed in double quotes and contains string content.
<string_content>: The content of a string can be a string part or a string part followed by more string content.
<string_part>: A string part can be literal text or a variable reference.
<text>: Text is a series of characters.
<variable_reference>: A variable reference follows the format $ variable .
<variable_name>: A variable name can be a letter or a combination of letters.
<value>: A value can be a string value, a number, or an expression.
<number>: A number can be a digit or a series of digits.
<expression>: An expression can be a value or a combination of two values with an operator.
<operator>: An operator can be +, -, *, or /.
<command>: A command can be any valid command syntax.
<character>: A character can be any character except double quotes and the backslash.
<letter>: A letter can be an alphabetical character, a digit, or an underscore.
<digit>: A digit is a number from 0 to 9.
This BNF notation covers the assignment of global and local variables, as well as variable substitution in strings.
Practical Example
// Definition of a variable
example_variable = 10
// Definition of a function
function example_function(parameter){
// Function body
result = parameter * 2
return(result)
}
// Function call
result = example_function(example_variable)
In this example, notation conventions are used to define a variable, a function, and to call the function with a parameter.
Conclusions
Notation in AVAP™ is a fundamental part of software development in the language. By following clear and consistent notation conventions, developers can write and maintain code more effectively, contributing to the readability, understanding, and maintainability of the code in projects of any size and complexity.
With this understanding of notation in AVAP™, developers can write clean and structured code that is easy to understand and maintain over time.

View File

@ -1,75 +0,0 @@
Introduction
Lexical analysis is the first step in the process of compiling or interpreting a program in AVAP™. It involves breaking down the source code into lexical components or "tokens," which are the smallest units of meaning in the language. These tokens include keywords, identifiers, operators, punctuation symbols, and literals.
Lexical Components in AVAP™
The lexical components in AVAP™ are similar to those in other programming languages like Python. Some of the most common lexical components in AVAP™ include:
Keywords: These are reserved words that have a special meaning in the language and cannot be used as variable or function names. Examples of keywords in AVAP™ include if, else, for, while, return, among others.
Identifiers: These are names given to variables, functions, and other elements of the program by the programmer. Identifiers must follow certain formatting rules and cannot match keywords. For example, variable, example_function, result are examples of identifiers in AVAP™.
Operators: These are symbols used to perform operations in the program. Examples of operators in AVAP™ include +, -, *, /, =, ==, !=, among others.
Literals: These represent constant values in the program, such as integers, floating-point numbers, text strings, and boolean values. Examples of literals in AVAP™ include 10, 3.14, "text", True, False, among others.
Punctuation Symbols: These are special characters used to separate elements of the code and define the structure of the program. Examples of punctuation symbols in AVAP™ include (), , [], ,, :, ;, among others.
Lexical Analysis Process
The lexical analysis process in AVAP™ consists of several steps:
Scanning: The source code is read sequentially, and the lexical components are identified. Regular expressions are used to recognize patterns corresponding to keywords, identifiers, operators, etc.
Tokenization: The identified lexical components are converted into tokens, which are objects representing each component with its associated type and value.
Token Generation: The generated tokens are passed to the next step of the compilation or interpretation process for syntactic and semantic analysis.
Keywords
Keywords in AVAP are reserved words that have specific meanings and cannot be used as identifiers. The keywords in AVAP are:
randomString
ormAI
functionAI
stampToDatetime
getTimeStamp
getRegex
getDateTime
encodeMD5
encodeSHA256
getQueryParamList
getListLen
ormCheckTable
ormCreateTable
end
else
if
endLoop
startLoop
ormAccessInsert
ormAccessSelect
variableToList
RequestPost
RequestGet
addResult
AddvariableToJSON
addParam
variableFromJSON
itemFromList
addVar
function
return
Practical Example
Below is a practical example that illustrates lexical analysis in AVAP™:
// Function definition
function function_example(parameter){
result = parameter * 2
return(result)
}
// Function call
value = function_example(10)
In this example, the lexical analysis would identify the following tokens:
function_example: Function identifier.
(, ): Punctuation symbols.
parameter, result, value: Variable identifiers.
=, *, 2: Operators.
10: Integer literal.
Conclusions
Lexical analysis is a crucial step in the compilation or interpretation of a program in AVAP™. By breaking down the source code into tokens, it lays the foundation for subsequent syntactic and semantic analysis, allowing the program to be correctly understood and executed by the interpreter or compiler.
With a clear understanding of lexical analysis in AVAP™, developers can write clean and structured code, facilitating the software development process in the language.

View File

@ -1,48 +0,0 @@
Introduction
The data model in AVAP™ defines how data is organized and manipulated within the language. Similar to Python, AVAP™ uses a flexible and dynamic data model that allows for working with a wide variety of data types and data structures.
Data Types
In AVAP™, just like in Python, data types are categories that represent different kinds of values that can be stored and manipulated in a program. Some of the most common data types in AVAP™ include:
Integers (int): Represent whole numbers, positive or negative, without a fractional part.
Floating-point numbers (float): Represent numbers with both integer and fractional parts.
Strings (str): Represent sequences of Unicode characters.
Booleans (bool): Represent truth values, either True or False.
Lists (list): Ordered and mutable collections of elements.
Tuples (tuple): Ordered and immutable collections of elements.
Dictionaries (dict): Unordered collections of key-value pairs.
Sets (set): Unordered collections of unique elements.
Data Structures
In addition to individual data types, AVAP™ provides various data structures that allow for more complex organization and manipulation of data:
Lists: Created using square brackets [ ] and can contain any data type, including other lists.
Tuples: Created using parentheses ( ) and are immutable, meaning they cannot be modified once created.
Dictionaries: Created using curly braces and store key-value pairs, where each key is unique within the dictionary.
Sets: Created using curly braces and contain unique elements, meaning there are no duplicates in a set.
Data Structures
In addition to individual data types, AVAP™ provides various data structures that allow for more complex organization and manipulation of data:
Lists: Created using square brackets [ ] and can contain any data type, including other lists.
Tuples: Created using parentheses ( ) and are immutable, meaning they cannot be modified once created.
Dictionaries: Created using curly braces and store key-value pairs, where each key is unique within the dictionary.
Sets: Created using curly braces and contain unique elements, meaning there are no duplicates in a set.
Practical Example
Below is a practical example that illustrates the use of the data model in AVAP™:
# Definition of a list
example_list = [1, 2, 3, 4, 5]
# Accessing individual elements
addResult(example_list[0]) # Output: 1
# Slicing to get a sublist
sublist = example_list[2:4]
addResult(sublist) # Output: [3, 4]
# List methods
example_list.append(6)
addResult(example_list) # Output: [1, 2, 3, 4, 5, 6]
Conclusions
The data model in AVAP™ provides a flexible and dynamic structure for working with data in the language. By understanding the available data types, data structures, operations, and methods, developers can write efficient and effective code that manipulates and processes data effectively.

View File

@ -1,62 +0,0 @@
Chapter 5: Data Types
In this chapter, we will explore the data types available in AVAP™. Data types are fundamental in programming as they determine what kind of values can be stored in a variable and what operations can be performed with those values. Throughout this chapter, we will discuss the basic data types in AVAP™ and how they are used in program development.
1.1 Basic Data Types
In AVAP™, like in Python, there are several basic data types:
1.1.1 Integers (int)
Integers represent whole numbers without decimals. They can be positive, negative, or zero. In AVAP™, integers are defined using the int data type.
integer_number = 10
1.1.2 Floating-Point Numbers (float)
Floating-point numbers represent real numbers with decimals. In AVAP™, they are defined using the float data type.
floating_number = 3.14
1.1.3 Strings (str)
Strings represent text. In AVAP™, they are defined using the str data type.
text_string = "Hello, world!"
1.1.4 Booleans (bool)
Booleans represent truth or falsehood values. In AVAP™, they are defined using the bool data type.
true_value = True
false_value = False
1.2 Conversion Between Data Types
In AVAP™, just like in Python, it is possible to convert between different data types using specific functions. Some common examples include:
1.2.1 Conversion to Integer
To convert a value to an integer, the int() function is used.
text = "10"
number = int(text)
1.2.2 Conversion to Floating-Point
To convert a value to a floating-point number, the float() function is used.
text = "3.14"
number = float(text)
1.2.3 Conversion to String
To convert a value to a string, the str() function is used.
number = 10
text = str(number)
1.3 Operations with Data Types
In AVAP™, just like in Python, it is possible to perform operations with different data types. For example:
# Operations with integers
a = 10
b = 5
sum = a + b
difference = a - b
# Operations with floating-point numbers
c = 3.5
d = 2.0
product = c * d
division = c / d
# Operations with strings
text1 = "Hello"
text2 = "world"
concatenation = text1 + " " + text2
1.4 Summary
Data types in AVAP™ are fundamental for program development. They allow for the storage and manipulation of different types of values, such as numbers, text, and truth values. With a solid understanding of data types and how they are used in program development, developers can create robust and functional applications in AVAP™.

View File

@ -1,44 +0,0 @@
Working with Variables
In this chapter, we will explore in detail working with variables in AVAP™. Variables are fundamental elements in programming as they allow us to store and manipulate data within a program. Throughout this chapter, we will examine the importance of variables, the types of local and global variables, as well as the different ways to declare them in AVAP™.
2.1 Importance of Variables
Variables play a crucial role in programming, as they allow us to store and manipulate data during the execution of a program. They enable the storage of temporary or permanent values, perform calculations, and facilitate communication between different parts of the program.
2.2 Types of Variables in AVAP™
In AVAP™, there are two main types of variables: local and global.
2.2.1 Local Variables
Local variables are those that are declared within a function or block of code and are only available within that scope. They have a limited scope, and their lifespan is restricted to the execution time of the block in which they are declared. Local variables are used to store temporary or intermediate data needed to perform calculations or execute operations within a function.
2.2.2 Global Variables
Global variables are those that are declared outside of any function or block of code and are available throughout the entire program. They have a global scope, and their lifespan lasts for the full duration of the program's execution. Global variables are used to store data that needs to be accessible from multiple parts of the program or that needs to retain its value over time.
2.3 Declaration of Variables in AVAP™
In AVAP™, variables can be declared in several ways:
2.3.1 addVar() Function
The addVar() function is used to declare global variables within the scope of an API. Its syntax is as follows:
addVar(variable_name, value)
Where:
variable_name is the name of the variable to be declared.
value is the initial value to be assigned to the variable (optional).
2.3.2 Direct Declaration
Local and global variables can also be declared directly without using the global statement, simply by assigning a value:
variable_name = value
Where:
variable_name is the name of the variable to be declared.
value is the initial value to be assigned to the variable.
2.3.3 Direct Initialization
It is also possible to declare and initialize a global variable at the same time using the following syntax:
addVar(variable_name,value)
Where:
variable_name is the name of the variable to be declared.
value is the initial value to be assigned to the variable, which automatically defines the variable's type.
2.4 Summary
Working with variables in AVAP™ is essential for developing efficient and scalable applications. Variables allow for storing and manipulating data during program execution, which facilitates calculations and communication between different parts of the program. With a solid understanding of variable types and the different ways to declare them in AVAP™, developers can create robust and functional applications.

View File

@ -1,40 +0,0 @@
How to Work with Comments
Comments are a fundamental tool in any programming language, as they allow you to document code, make it easier to understand, and help keep it organized. In the AVAP™ programming language, comments are an integral part of the syntax and are used to add additional information to the source code without affecting its execution.
Comments serve several purposes:
Documentation: Comments can be used to explain what specific parts of the code do, which can be helpful for anyone reading or maintaining the code in the future.
Clarification: They can clarify complex sections of code, making it easier for others (or yourself) to understand the logic and flow of the program.
Organization: Comments can help organize code by separating different sections or explaining the purpose of various code blocks.
Debugging: Comments can temporarily disable parts of code during debugging without deleting it, allowing you to test different scenarios.
In AVAP™, you can use different types of comments to suit your needs. They can be single-line comments or multi-line comments, depending on the level of detail and context required.
By incorporating comments into your code, you make it more maintainable and easier for others to follow, which is essential for collaborative projects and long-term code management.
3.1 Line Comments
Line comments in AVAP™ are used to add brief annotations or explanations to a specific line of code. These comments begin with the // symbol and continue until the end of the line. Everything following // is considered a comment and is ignored by the compiler.
// This is a line comment in AVAP™ int x = 5; // You can also add comments at the end of a line of code
Line comments are useful for providing quick clarifications about the code and improving its readability.
3.2 Block Comments
Block comments in AVAP™ are used to add comments that span multiple lines of code. These comments begin with /* and end with */. Everything between /* and */ is considered a comment and is ignored by the compiler.
/* This is a block comment in AVAP™ that spans multiple lines of code. It is used to explain extensive sections of code or to temporarily disable entire blocks of code. */
Block comments are ideal for providing detailed explanations about complex sections of code or for temporarily disabling entire blocks of code during debugging.
3.3 Documentation Comments
AVAP™ also supports documentation comments, which are used to automatically generate documentation from the source code. These comments begin with /// and are used to describe the functionality of classes, methods, variables, and other elements of the source code.
/// This function adds two integers and returns the result. /// \param a The first integer. /// \param b The second integer. /// \return The sum of the two numbers. int sum(int a, int b) return a + b;
Documentation comments are essential for maintaining up-to-date and detailed documentation of the code, which facilitates its understanding and use by other developers.
3.4 Best Practices
When using comments in AVAP™, it is important to follow some best practices:
Use comments moderately and only when necessary to clarify the code.
Keep comments updated as the code evolves.
Write clear and concise comments that are easy for other developers to understand.
Avoid redundant or unnecessary comments that do not provide useful information to the reader.
3.5 Summary
Comments in AVAP™ are an essential tool for improving the readability and maintainability of source code. With line comments, block comments, and documentation comments, developers can add explanations, clarifications, and useful documentation to the code, making it easier to understand and collaborate within development teams.

View File

@ -1,53 +0,0 @@
Expressions in AVAP™
Introduction
Expressions in AVAP™ are combinations of values, variables, operators, and function calls that can be evaluated to produce a result. Just like in Python, expressions in AVAP™ can be simple or complex, and they can contain a variety of elements that manipulate and process data.
Types of Expressions
In AVAP™, as in Python, there are several types of expressions that can be used to perform different operations and calculations. Some of the most common types of expressions include:
Arithmetic: Perform mathematical operations such as addition, subtraction, multiplication, and division.
Logical: Evaluate logical conditions and return boolean values, such as True or False.
Comparative: Compare two values and return a result based on their relationship, such as equality, inequality, greater than, less than, etc.
Assignment: Assign a value to a variable.
Function Calls: Invoke functions and methods to perform specific tasks.
Operators
In AVAP™, as in Python, expressions can include a variety of operators that perform specific operations on data. Some of the most common operators include:
Arithmetic: +, -, *, /, %, etc.
Logical: and, or, not.
Comparative: ==, !=, >, <, >=, <=, etc.
Assignment: =, +=, -=, *=, /=, etc.
Working with Lists
Lists are a very versatile data structure in AVAP™ that allows you to store collections of elements of different types. Expressions in AVAP™ can involve operations and manipulations of lists, such as accessing individual elements, concatenation, searching, deletion, and more.
// Definition of a list
my_list = [1, 2, 3, 4, 5]
// Accessing individual elements
first_element = my_list[0] // Output: 1
// Concatenation of lists
another_list = [6, 7, 8]
combined_list = my_list + another_list // Output: [1, 2, 3, 4, 5, 6, 7, 8]
// Length of a list
length = len(my_list) // Output: 5
// Searching in a list
is_present = 5 in my_list // Output: True
// Removing elements
my_list.remove(3) // Removes the element 3 from the list
Practical Example
Below is a practical example that illustrates the use of expressions in AVAP™ with lists:
// Definition of a list of numbers
numbers = [1, 2, 3, 4, 5]
// Calculation of the sum of the elements
total = sum(numbers) // Output: 15
// Checking if a number is present in the list
is_present = 6 in numbers // Output: False
Conclusions
Expressions in AVAP™ are a fundamental part of programming, allowing for a wide variety of data operations and manipulations. By understanding the different types of expressions and operators, as well as working with data structures such as lists, developers can write clear and effective code that meets the program's requirements.

View File

@ -0,0 +1,138 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "93b6fd90",
"metadata": {},
"outputs": [],
"source": [
"from pprint import pprint\n",
"import json\n",
"\n",
"from src.config import settings"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b7799967",
"metadata": {},
"outputs": [],
"source": [
"with open(settings.external_path / \"total_data_cleaned_python.json\", 'r') as f:\n",
" api_pack_dataset = json.load(f)\n",
"\n",
"with open(settings.external_path / \"huggingface_train.json\", \"r\", encoding=\"utf-8\") as f:\n",
" api_bench_dataset = [json.loads(line) for line in f if line.strip()]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7c625700",
"metadata": {},
"outputs": [],
"source": [
"api_bench_dataset"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "842a09dc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('import http.client\\n'\n",
" '\\n'\n",
" 'conn = http.client.HTTPConnection(\"undefinedhttps\")\\n'\n",
" '\\n'\n",
" 'headers = {\\n'\n",
" ' \\'X-RapidAPI-Key\\': \"SOME_STRING_VALUE\",\\n'\n",
" ' \\'X-RapidAPI-Host\\': \"SOME_STRING_VALUE\"\\n'\n",
" ' }\\n'\n",
" '\\n'\n",
" 'conn.request(\"GET\", '\n",
" '\"//climatechange1.p.rapidapi.com/news/%7Bnewspapercode%7D?newspapercode=SOME_STRING_VALUE\", '\n",
" 'headers=headers)\\n'\n",
" '\\n'\n",
" 'res = conn.getresponse()\\n'\n",
" 'data = res.read()\\n'\n",
" '\\n'\n",
" 'print(data.decode(\"utf-8\"))')\n"
]
}
],
"source": [
"pprint(api_pack_dataset[3111][\"api_call_data\"][\"api_call\"])"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "1638cb7c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(\"**domain**:['Shipment']\\n\"\n",
" '**api_call**:import http.client\\n'\n",
" '\\n'\n",
" 'conn = http.client.HTTPSConnection(\"null\")\\n'\n",
" '\\n'\n",
" 'headers = { \\'Authorization\\': \"REPLACE_KEY_VALUE\" }\\n'\n",
" '\\n'\n",
" 'conn.request(\"GET\", '\n",
" '\"/business.hema.digital/fulfillment/v2/salesorders/975680088608/consignments/19a5095f-4c99-4ed6-bc35-1e358b127ad4/shipments/378020893722472149\", '\n",
" 'headers=headers)\\n'\n",
" '\\n'\n",
" 'res = conn.getresponse()\\n'\n",
" 'data = res.read()\\n'\n",
" '\\n'\n",
" 'print(data.decode(\"utf-8\"))\\n'\n",
" '**api_provider**:\\n'\n",
" '**lang**:Python')\n"
]
}
],
"source": [
"pprint(api_pack_dataset[7][\"output\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1979cadb",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "assistance-engine",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 12,
"id": "9d524159",
"metadata": {},
"outputs": [],
@ -16,7 +16,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 13,
"id": "330f1975",
"metadata": {},
"outputs": [],
@ -32,7 +32,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 14,
"id": "68887173",
"metadata": {},
"outputs": [],
@ -210,7 +210,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 15,
"id": "e693a3fa",
"metadata": {},
"outputs": [
@ -238,18 +238,18 @@
"llama_model_loader: - kv 15: qwen2.embedding_length u32 = 1536\n",
"llama_model_loader: - kv 16: qwen2.feed_forward_length u32 = 8960\n",
"llama_model_loader: - kv 17: qwen2.attention.head_count u32 = 12\n",
"llama_model_loader: - kv 18: qwen2.attention.head_count_kv u32 = 2\n",
"llama_model_loader: - kv 19: qwen2.rope.freq_base f32 = 1000000.000000\n",
"llama_model_loader: - kv 20: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001\n",
"llama_model_loader: - kv 21: general.file_type u32 = 7\n",
"llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2\n",
"llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2\n"
"llama_model_loader: - kv 18: qwen2.attention.head_count_kv u32 = 2\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"llama_model_loader: - kv 19: qwen2.rope.freq_base f32 = 1000000.000000\n",
"llama_model_loader: - kv 20: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001\n",
"llama_model_loader: - kv 21: general.file_type u32 = 7\n",
"llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2\n",
"llama_model_loader: - kv 23: tokenizer.ggml.pre str = qwen2\n",
"llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,151936] = [\"!\", \"\\\"\", \"#\", \"$\", \"%\", \"&\", \"'\", ...\n",
"llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...\n",
"llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,151387] = [\"Ġ Ġ\", \"ĠĠ ĠĠ\", \"i n\", \"Ġ t\",...\n",
@ -501,7 +501,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 16,
"id": "aa66f897",
"metadata": {},
"outputs": [
@ -509,25 +509,24 @@
"name": "stderr",
"output_type": "stream",
"text": [
"Llama.generate: 9 prefix-match hit, remaining 1 prompt tokens to eval\n",
"llama_perf_context_print: load time = 1762.25 ms\n",
"llama_perf_context_print: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)\n",
"llama_perf_context_print: eval time = 64123.14 ms / 502 runs ( 127.74 ms per token, 7.83 tokens per second)\n",
"llama_perf_context_print: total time = 133698.82 ms / 503 tokens\n",
"llama_perf_context_print: graphs reused = 486\n"
"llama_perf_context_print: load time = 869.83 ms\n",
"llama_perf_context_print: prompt eval time = 869.29 ms / 14 tokens ( 62.09 ms per token, 16.11 tokens per second)\n",
"llama_perf_context_print: eval time = 69934.23 ms / 497 runs ( 140.71 ms per token, 7.11 tokens per second)\n",
"llama_perf_context_print: total time = 98408.07 ms / 511 tokens\n",
"llama_perf_context_print: graphs reused = 481\n"
]
}
],
"source": [
"response = llm_model(\n",
" \"Create a simple hello world function in AVAP language\",\n",
" \"Create a simple hello world in AVAP using addParam and addResult\",\n",
" grammar=grammar, max_tokens=-1, temperature=0\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 17,
"id": "317b96ae",
"metadata": {},
"outputs": [
@ -539,36 +538,17 @@
" 'index': 0,\n",
" 'logprobs': None,\n",
" 'text': '\\n'\n",
" 'Sure.HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" 'Sure.HereisaprogramthatcreatesaSimpleHelloWorldinAVAPusingaddParamandaddResultinActionScript3(AS3)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltintextoutputfunctionalityinAVAPlanguage(assumingyouhaveAVAPinstalledandconfiguredonyoursystem)\\n'\n",
" '\\n'\n",
" 'HereisaprogramthatprintsouthestringhelloworldinAVAPlanguageusingthebuiltint'}],\n",
" 'created': 1773656986,\n",
" 'id': 'cmpl-3c382cd0-7254-4bbc-8e71-84f97f06006a',\n",
" 'packagecom.exampleavaphelloworldappletestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapplettestapp'}],\n",
" 'created': 1773664389,\n",
" 'id': 'cmpl-643bf384-aef4-4d9f-b011-e35d180a6615',\n",
" 'model': '/home/acano/PycharmProjects/assistance-engine/data/models/qwen2.5-coder-1.5b-q8_0.gguf',\n",
" 'object': 'text_completion',\n",
" 'usage': {'completion_tokens': 502, 'prompt_tokens': 10, 'total_tokens': 512}}\n"
" 'usage': {'completion_tokens': 498, 'prompt_tokens': 14, 'total_tokens': 512}}\n"
]
}
],

View File

@ -1,5 +1,4 @@
import typer
import logging
from loguru import logger
@ -58,10 +57,6 @@ def elasticsearch_ingestion(
if __name__ == "__main__":
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
)
try:
app()
except Exception as exc:

View File

@ -190,8 +190,8 @@ def generate_synthetic_dataset(
provider: Provider = Provider.bedrock,
model: str = "global.anthropic.claude-sonnet-4-6",
temperature: float = 0.0,
num_samples: int = 10,
seed: int = 42,
num_samples: int = 30,
seed: int = 67,
context_docs_path: str = "docs/LRM/avap.md",
synthetic_output_path: str = "synthetic_datasets",
dataset: Optional[Dataset] = None,

View File

@ -0,0 +1,49 @@
import typer
from loguru import logger
from scripts.pipelines.tasks.validate import (
load_tasks,
save_validated_tasks,
validate_all_tasks,
)
from src.config import settings
app = typer.Typer()
@app.command()
def validate_synthetic_dataset(
dataset_path: str = "synthetic_datasets/synthetic_data_generated_bedrock.json",
output_path: str = "synthetic_datasets/validated_synthetic_dataset.json",
api_url: str = settings.parser_url,
timeout: int = 120,
) -> None:
"""Validate a synthetic dataset against the AVAP runtime.
Sends the dataset to the validation API, collects per-task results,
and writes a new JSON file containing only the tasks that passed.
Args:
dataset_path: Path to the input synthetic dataset JSON file.
output_path: Path where the validated dataset JSON file will be saved.
api_url: URL of the validation API endpoint.
timeout: Timeout in seconds for the API request.
Returns:
None
"""
dataset_path = settings.proj_root / dataset_path
output_path = settings.proj_root / output_path
tasks = load_tasks(dataset_path)
validated_tasks = validate_all_tasks(tasks, api_url, timeout)
save_validated_tasks(validated_tasks, output_path)
if __name__ == "__main__":
try:
app()
except Exception as exc:
logger.exception(exc)
raise

View File

@ -0,0 +1,794 @@
"""
chunker.py v1.0
Uso:
python chunker.py --lang-config avap_config.json --docs-path ./docs/samples
python chunker.py --lang-config avap_config.json --docs-path ./docs/samples --workers 8
python chunker.py --lang-config avap_config.json --docs-path ./docs/samples --redis-url redis://localhost:6379
python chunker.py --lang-config avap_config.json --docs-path ./docs/samples --no-dedup
"""
import re
import os
import json
import hashlib
import argparse
import tempfile
import warnings as py_warnings
from pathlib import Path
from dataclasses import dataclass, asdict, field
from typing import Optional, Generator, IO
from concurrent.futures import ProcessPoolExecutor, as_completed
try:
import tiktoken
_ENC = tiktoken.get_encoding("cl100k_base")
def count_tokens(text: str) -> int:
return len(_ENC.encode(text))
TOKEN_BACKEND = "tiktoken/cl100k_base"
except ImportError:
py_warnings.warn("tiktoken no instalado — usando word-count. pip install tiktoken",
stacklevel=2)
def count_tokens(text: str) -> int: # type: ignore[misc]
return len(text.split())
TOKEN_BACKEND = "word-count (estimación)"
try:
from datasketch import MinHash, MinHashLSH
MINHASH_AVAILABLE = True
except ImportError:
MINHASH_AVAILABLE = False
py_warnings.warn("datasketch no instalado — dedup desactivada. pip install datasketch",
stacklevel=2)
try:
from tqdm import tqdm
except ImportError:
def tqdm(x, **kwargs): return x # type: ignore[misc]
MAX_NARRATIVE_TOKENS = 400
OVERLAP_LINES = 3
DEDUP_THRESHOLD = 0.85
MINHASH_NUM_PERM = 128
MINHASH_SHINGLE_SIZE = 3
DEFAULT_WORKERS = max(1, (os.cpu_count() or 4) - 1)
@dataclass
class BlockDef:
name: str
doc_type: str
opener_re: re.Pattern
closer_re: re.Pattern
extract_signature:bool = False
signature_template:str = ""
def extract_sig(self, clean_line):
if not self.extract_signature:
return None
m = self.opener_re.match(clean_line)
if not m:
return None
tpl = self.signature_template
for i, g in enumerate(m.groups(), start=1):
tpl = tpl.replace(f"{{group{i}}}", (g or "").strip())
return tpl
@dataclass
class StatementDef:
name: str
re: re.Pattern
@dataclass
class SemanticTag:
tag: str
re: re.Pattern
class LanguageConfig:
def __init__(self, config_path: str):
raw = json.loads(Path(config_path).read_text(encoding="utf-8"))
self.language = raw.get("language", "unknown")
self.version = raw.get("version", "1.0")
self.extensions = set(raw.get("file_extensions", []))
lex = raw.get("lexer", {})
self.string_delimiters = lex.get("string_delimiters", ['"', "'"])
self.escape_char = lex.get("escape_char", "\\")
self.comment_line = sorted(lex.get("comment_line", ["#"]), key=len, reverse=True)
cb = lex.get("comment_block", {})
self.comment_block_open = cb.get("open", "")
self.comment_block_close = cb.get("close", "")
self.line_oriented = lex.get("line_oriented", True)
self.blocks: list[BlockDef] = []
for b in raw.get("blocks", []):
self.blocks.append(BlockDef(
name = b["name"],
doc_type = b.get("doc_type", "code"),
opener_re = re.compile(b["opener_pattern"]),
closer_re = re.compile(b["closer_pattern"]),
extract_signature = b.get("extract_signature", False),
signature_template = b.get("signature_template", ""),
))
self.statements: list[StatementDef] = [
StatementDef(name=s["name"], re=re.compile(s["pattern"]))
for s in raw.get("statements", [])
]
self.semantic_tags: list[SemanticTag] = [
SemanticTag(tag=t["tag"], re=re.compile(t["pattern"]))
for t in raw.get("semantic_tags", [])
]
def match_opener(self, clean_line):
for block in self.blocks:
if block.opener_re.match(clean_line):
return block
return None
def match_closer(self, clean_line):
for block in self.blocks:
if block.closer_re.match(clean_line):
return True
return False
def classify_statement(self, clean_line):
for stmt in self.statements:
if stmt.re.match(clean_line):
return stmt.name
return "statement"
def enrich_metadata(self, content):
meta: dict = {}
for tag in self.semantic_tags:
if tag.re.search(content):
meta[tag.tag] = True
meta["complexity"] = sum(1 for v in meta.values() if v is True)
return meta
@dataclass
class Chunk:
chunk_id: str
source_file: str
doc_type: str
block_type: str
section: str
start_line: int
end_line: int
content: str
metadata: dict = field(default_factory=dict)
def token_count(self):
return count_tokens(self.content)
def to_dict(self):
d = asdict(self)
d["token_estimate"] = self.token_count()
return d
def make_chunk_id(filepath, start, end, content):
return hashlib.sha1(
f"{filepath.name}:{start}:{end}:{content[:60]}".encode()
).hexdigest()[:16]
def make_chunk(filepath: Path, doc_type, block_type,section, start, end, content, cfg, extra_meta = None):
content = content.strip()
meta = cfg.enrich_metadata(content)
if extra_meta:
meta.update(extra_meta)
return Chunk(
chunk_id=make_chunk_id(filepath, start, end, content),
source_file=str(filepath),
doc_type=doc_type, block_type=block_type,
section=section, start_line=start, end_line=end,
content=content, metadata=meta,
)
class GenericLexer:
def __init__(self, cfg: LanguageConfig):
self.cfg = cfg
self.in_block_comment = False
def process_line(self, raw):
if self.in_block_comment:
if self.cfg.comment_block_close and \
self.cfg.comment_block_close in raw:
self.in_block_comment = False
return False, ""
cb_open = self.cfg.comment_block_open
cb_close = self.cfg.comment_block_close
if cb_open and cb_open in raw:
idx_open = raw.index(cb_open)
rest = raw[idx_open + len(cb_open):]
if cb_close and cb_close in rest:
idx_close = raw.index(cb_close, idx_open)
code_part = raw[:idx_open] + raw[idx_close + len(cb_close):]
return self._strip_line_comments(code_part)
else:
self.in_block_comment = True
return self._strip_line_comments(raw[:idx_open])
return self._strip_line_comments(raw)
def _strip_line_comments(self, raw):
in_str: Optional[str] = None
result = []
i = 0
while i < len(raw):
ch = raw[i]
if in_str and ch == self.cfg.escape_char:
result.append(ch)
if i + 1 < len(raw):
result.append(raw[i + 1])
i += 2
else:
i += 1
continue
if in_str and ch == in_str:
in_str = None
result.append(ch); i += 1; continue
if not in_str and ch in self.cfg.string_delimiters:
in_str = ch
result.append(ch); i += 1; continue
if not in_str:
matched = False
for prefix in self.cfg.comment_line:
if raw[i:].startswith(prefix):
matched = True
break
if matched:
break
result.append(ch); i += 1
code = "".join(result).strip()
return bool(code), code
class SemanticOverlapBuffer:
def __init__(self, overlap_lines = OVERLAP_LINES):
self.overlap_lines = overlap_lines
self._prev = None
self._current_fn_sig = None
self._current_fn_file = None
def notify_function(self, sig, source_file):
self._current_fn_sig = sig
self._current_fn_file = source_file
def notify_file_change(self, source_file):
if self._current_fn_file != source_file:
self._current_fn_sig = None
self._current_fn_file = source_file
self._prev = None
def apply(self, chunk):
if self.overlap_lines <= 0:
self._prev = chunk
return chunk
if self._prev and self._prev.source_file != chunk.source_file:
self.notify_file_change(chunk.source_file)
context_header = None
if (self._current_fn_sig
and self._current_fn_file == chunk.source_file
and chunk.block_type not in ("function", "function_signature")):
context_header = f"// contexto: {self._current_fn_sig}"
overlap_type = "function_sig"
elif (self._prev
and self._prev.source_file == chunk.source_file
and self._prev.doc_type == chunk.doc_type):
context_header = "\n".join(
self._prev.content.splitlines()[-self.overlap_lines:])
overlap_type = "line_tail"
else:
overlap_type = "none"
self._prev = chunk
if context_header:
new_content = (context_header + "\n" + chunk.content).strip()
return Chunk(
chunk_id=chunk.chunk_id, source_file=chunk.source_file,
doc_type=chunk.doc_type, block_type=chunk.block_type,
section=chunk.section, start_line=chunk.start_line,
end_line=chunk.end_line, content=new_content,
metadata={**chunk.metadata,
"has_overlap": True,
"overlap_type": overlap_type},
)
return chunk
def _shingles(text, k = MINHASH_SHINGLE_SIZE):
words = text.lower().split()
if len(words) < k:
return [" ".join(words).encode()]
return [" ".join(words[i:i+k]).encode() for i in range(len(words) - k + 1)]
def _build_minhash(text):
m = MinHash(num_perm=MINHASH_NUM_PERM)
for s in _shingles(text):
m.update(s)
return m
class StreamingDeduplicator:
def __init__(self, threshold: float = DEDUP_THRESHOLD ):
self.threshold = threshold
self._lsh: dict[str, "MinHashLSH"] = {}
self.removed = 0
def _get_lsh(self, doc_type):
if doc_type not in self._lsh:
self._lsh[doc_type] = MinHashLSH(
threshold=self.threshold, num_perm=MINHASH_NUM_PERM)
return self._lsh[doc_type]
def is_duplicate(self, chunk):
if not MINHASH_AVAILABLE:
return False
lsh = self._get_lsh(chunk.doc_type)
m = _build_minhash(chunk.content)
try:
if lsh.query(m):
self.removed += 1
return True
except Exception:
pass
try:
lsh.insert(chunk.chunk_id, m)
except Exception as e:
print(e)
pass
return False
class JsonlWriter:
def __init__(self, path):
out = Path(path)
if out.suffix.lower() == ".json":
out = out.with_suffix(".jsonl")
out.parent.mkdir(parents=True, exist_ok=True)
self.path = out
self._handle: IO = open(out, "w", encoding="utf-8")
self.written = 0
def write(self, chunk):
self._handle.write(json.dumps(chunk.to_dict(), ensure_ascii=False) + "\n")
self.written += 1
def close(self):
if self._handle:
self._handle.close()
def validate_syntax(lines, filepath, cfg ):
warnings_out = []
stack = []
lexer = GenericLexer(cfg)
for i, raw in enumerate(lines):
line_no = i + 1
is_code, clean = lexer.process_line(raw)
if not is_code or not clean:
continue
block = cfg.match_opener(clean)
if block:
stack.append((block.name, line_no))
continue
if cfg.match_closer(clean):
if stack:
stack.pop()
else:
warnings_out.append(
f"{filepath.name}:{line_no} — close without open")
for bt, ln in stack:
warnings_out.append(
f"{filepath.name}:{ln} — not closed block '{bt}'")
return warnings_out
def iter_code_chunks(filepath, cfg, overlap_buf):
lines = filepath.read_text(encoding="utf-8").splitlines()
warnings = validate_syntax(lines, filepath, cfg)
overlap_buf.notify_file_change(str(filepath))
lexer = GenericLexer(cfg)
i = 0
pending_raw = []
loose_buffer = []
loose_type = None
def flush_loose():
nonlocal loose_buffer, loose_type
if not loose_buffer:
return
start = loose_buffer[0][0]
end = loose_buffer[-1][0]
content = "\n".join(t for _, t in loose_buffer)
chunk = make_chunk(filepath, "code", loose_type or "statement",
"", start, end, content, cfg)
chunk = overlap_buf.apply(chunk)
loose_buffer.clear(); loose_type = None
yield chunk
while i < len(lines):
raw = lines[i]
line_no = i + 1
is_code, clean = lexer.process_line(raw)
if not is_code or not clean:
pending_raw.append(raw); i += 1; continue
block_def = cfg.match_opener(clean)
if block_def:
yield from flush_loose()
block_start = line_no
block_lines = list(pending_raw) + [raw]
pending_raw.clear()
sig = block_def.extract_sig(clean)
if sig:
overlap_buf.notify_function(sig, str(filepath))
depth = 1; i += 1
while i < len(lines) and depth > 0:
inner_raw = lines[i]
_, inner_clean = lexer.process_line(inner_raw)
block_lines.append(inner_raw)
if inner_clean:
if cfg.match_opener(inner_clean):
depth += 1
elif cfg.match_closer(inner_clean):
depth -= 1
i += 1
chunk = make_chunk(filepath, block_def.doc_type, block_def.name, "", block_start, i, "\n".join(block_lines), cfg)
chunk = overlap_buf.apply(chunk)
yield chunk
if sig:
yield make_chunk(
filepath, "function_signature", "function_signature", "", block_start, block_start, sig, cfg,
extra_meta={"full_block_start": block_start,
"full_block_end": i}
)
continue
stmt_type = cfg.classify_statement(clean)
if loose_type and stmt_type != loose_type:
yield from flush_loose()
if pending_raw and not loose_buffer:
for pc in pending_raw:
loose_buffer.append((line_no, pc))
pending_raw.clear()
loose_type = stmt_type
loose_buffer.append((line_no, raw))
i += 1
yield from flush_loose()
if warnings:
yield (None, warnings)
RE_MD_H1 = re.compile(r"^# (.+)")
RE_MD_H2 = re.compile(r"^## (.+)")
RE_MD_H3 = re.compile(r"^### (.+)")
RE_FENCE_OPEN = re.compile(r"^```(\w*)")
RE_FENCE_CLOSE = re.compile(r"^```\s*$")
RE_TABLE_ROW = re.compile(r"^\|")
def split_narrative_by_tokens(text, max_tokens):
paragraphs = re.split(r"\n\s*\n", text)
result = []; current = []; current_tokens = 0
for para in paragraphs:
pt = count_tokens(para)
if current_tokens + pt > max_tokens and current:
result.append("\n\n".join(current))
current = [para]; current_tokens = pt
else:
current.append(para); current_tokens += pt
if current:
result.append("\n\n".join(current))
return [t for t in result if t.strip()]
def iter_markdown_chunks(filepath, cfg, max_tokens = MAX_NARRATIVE_TOKENS):
lines = filepath.read_text(encoding="utf-8").splitlines()
current_h1 = current_h2 = current_h3 = ""
def section_label() -> str:
return " > ".join(p for p in [current_h1, current_h2, current_h3] if p)
def make_md_chunk(doc_type, block_type, start, end, content) -> Chunk:
return make_chunk(filepath, doc_type, block_type,
section_label(), start, end, content, cfg)
i = 0
narrative_start = 1; narrative_lines: list[str] = []
def flush_narrative() -> Generator:
nonlocal narrative_lines, narrative_start
text = "\n".join(narrative_lines).strip()
if not text:
narrative_lines.clear(); return
for sub in split_narrative_by_tokens(text, max_tokens):
sl = sub.count("\n") + 1
yield make_md_chunk("spec", "narrative",
narrative_start, narrative_start + sl - 1, sub)
narrative_lines.clear()
while i < len(lines):
raw = lines[i]; line_no = i + 1
m1 = RE_MD_H1.match(raw); m2 = RE_MD_H2.match(raw); m3 = RE_MD_H3.match(raw)
if m1:
yield from flush_narrative()
current_h1 = m1.group(1).strip(); current_h2 = current_h3 = ""
narrative_start = line_no + 1; i += 1; continue
if m2:
yield from flush_narrative()
current_h2 = m2.group(1).strip(); current_h3 = ""
narrative_start = line_no + 1; i += 1; continue
if m3:
yield from flush_narrative()
current_h3 = m3.group(1).strip()
narrative_start = line_no + 1; i += 1; continue
fm = RE_FENCE_OPEN.match(raw)
if fm and not RE_FENCE_CLOSE.match(raw):
yield from flush_narrative()
lang = fm.group(1).lower() or "code"
doc_type = "bnf" if lang == "bnf" else "code_example"
fence_start = line_no
fence_lines = [raw]; i += 1
while i < len(lines):
fence_lines.append(lines[i])
if RE_FENCE_CLOSE.match(lines[i]) and len(fence_lines) > 1:
i += 1; break
i += 1
yield make_md_chunk(doc_type, lang,
fence_start, fence_start + len(fence_lines) - 1,
"\n".join(fence_lines))
narrative_start = i + 1
continue
if RE_TABLE_ROW.match(raw):
yield from flush_narrative()
ts = line_no; tl = []
while i < len(lines) and RE_TABLE_ROW.match(lines[i]):
tl.append(lines[i]); i += 1
yield make_md_chunk("spec", "table", ts, ts + len(tl) - 1, "\n".join(tl))
narrative_start = i + 1
continue
if not narrative_lines:
narrative_start = line_no
narrative_lines.append(raw)
i += 1
yield from flush_narrative()
def _worker(args):
paths, config_path, overlap_lines, max_tokens = args
cfg = LanguageConfig(config_path)
overlap_buf = SemanticOverlapBuffer(overlap_lines)
stats = {t: 0 for t in ["code", "function_signature", "spec", "bnf", "code_example", "unknown", "total"]}
all_warnings = []
fd, tmp_path = tempfile.mkstemp(suffix=".jsonl", prefix="worker_")
os.close(fd)
with open(tmp_path, "w", encoding="utf-8") as f:
for path in paths:
ext = path.suffix.lower()
if ext in cfg.extensions:
for item in iter_code_chunks(path, cfg, overlap_buf):
if isinstance(item, tuple) and item[0] is None:
all_warnings.extend(item[1])
continue
chunk = item
f.write(json.dumps(chunk.to_dict(), ensure_ascii=False) + "\n")
stats[chunk.doc_type] = stats.get(chunk.doc_type, 0) + 1
stats["total"] += 1
elif ext == ".md":
for chunk in iter_markdown_chunks(path, cfg, max_tokens):
f.write(json.dumps(chunk.to_dict(), ensure_ascii=False) + "\n")
stats[chunk.doc_type] = stats.get(chunk.doc_type, 0) + 1
stats["total"] += 1
else:
content = path.read_text(encoding="utf-8")
chunk = make_chunk(path, "unknown", "raw", "", 1,
content.count("\n") + 1, content, cfg)
f.write(json.dumps(chunk.to_dict(), ensure_ascii=False) + "\n")
stats["unknown"] += 1; stats["total"] += 1
return tmp_path, stats, all_warnings
def fetch_documents(docs_path, cfg, extra_extensions):
root = Path(docs_path)
if not root.exists():
raise FileNotFoundError(f"PATH not found: {root}")
all_exts = cfg.extensions | set(extra_extensions)
return sorted(p for p in root.rglob("*")
if p.is_file() and p.suffix.lower() in all_exts)
def _partition(paths, n):
k = max(1, len(paths) // n)
return [paths[i:i+k] for i in range(0, len(paths), k)]
def run_pipeline(paths,
config_path,
writer,
deduplicator,
overlap_lines,
max_tokens,
workers):
total_stats = {t: 0 for t in ["code", "function_signature", "spec", "bnf", "code_example", "unknown", "total", "dedup_removed"]}
all_warnings = []
tmp_files = []
partitions = _partition(paths, workers)
worker_args = [(part, config_path, overlap_lines, max_tokens) for part in partitions]
print(f"{len(paths)} Files in {len(partitions)} workers...\n")
with ProcessPoolExecutor(max_workers=workers) as executor:
futures = {executor.submit(_worker, arg): i
for i, arg in enumerate(worker_args)}
for future in tqdm(as_completed(futures), total=len(futures),
desc=" Workers", unit="worker"):
tmp_path, stats, warns = future.result()
tmp_files.append(tmp_path)
all_warnings.extend(warns)
for k, v in stats.items():
total_stats[k] = total_stats.get(k, 0) + v
print(f"\n Mergin {len(tmp_files)} partial files...")
for tmp_path in tqdm(tmp_files, desc=" Merge + dedup", unit="file"):
with open(tmp_path, encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
cd = json.loads(line)
if deduplicator:
c = Chunk(
chunk_id=cd["chunk_id"], source_file=cd["source_file"],
doc_type=cd["doc_type"], block_type=cd["block_type"],
section=cd["section"], start_line=cd["start_line"],
end_line=cd["end_line"], content=cd["content"],
metadata=cd.get("metadata", {}),
)
if deduplicator.is_duplicate(c):
total_stats["dedup_removed"] = \
total_stats.get("dedup_removed", 0) + 1
continue
writer._handle.write(line + "\n")
writer.written += 1
except json.JSONDecodeError as e:
print(e)
pass
Path(tmp_path).unlink(missing_ok=True)
return total_stats, all_warnings
def print_report(stats, warnings, output_path, token_backend, workers, language):
print(f" RESULT — [{language}]")
print(f" Tokenizer : {token_backend}")
dedup_be = "MinHash LSH (RAM)" if MINHASH_AVAILABLE else "desactivada"
print(f" Dedup backend : {dedup_be}")
print(f" Workers : {workers}")
print()
for t in ["code", "function_signature", "spec", "bnf", "code_example", "unknown"]:
n = stats.get(t, 0)
if n:
print(f" {t:<25}: {n:>6} chunks")
print(f"\n Total written : {stats.get('total', 0)}")
print(f" Erased (dedup) : {stats.get('dedup_removed', 0)}")
if warnings:
print(f"\n Warnings ({len(warnings)}):")
for w in warnings[:20]:
print(w)
if len(warnings) > 20:
print(f" ... and {len(warnings) - 20} more")
else:
print("\n Ok")
print(f"\n OUTPUT File {output_path}")
def main():
parser = argparse.ArgumentParser(
description="GEneric chunker"
)
parser.add_argument("--lang-config", required=True,
help="(ej: avap_config.json)")
parser.add_argument("--docs-path", default="docs/samples")
parser.add_argument("--output", default="ingestion/chunks.jsonl")
parser.add_argument("--overlap", type=int, default=OVERLAP_LINES)
parser.add_argument("--max-tokens", type=int, default=MAX_NARRATIVE_TOKENS)
parser.add_argument("--dedup-threshold", type=float, default=DEDUP_THRESHOLD)
parser.add_argument("--no-dedup", action="store_true")
parser.add_argument("--no-overlap", action="store_true")
parser.add_argument("--workers", type=int, default=DEFAULT_WORKERS)
args = parser.parse_args()
cfg = LanguageConfig(args.lang_config)
overlap = 0 if args.no_overlap else args.overlap
print(f" Lenguaje : {cfg.language} v{cfg.version}")
print(f" Config : {args.lang_config}")
print(f" Extensiones : {cfg.extensions | {'.md'}}")
print(f" Docs path : {args.docs_path}")
print(f" Output : {args.output}")
print(f" Workers : {args.workers}")
print(f" Tokenizador : {TOKEN_BACKEND}")
print(f" Overlap : {overlap} líneas (semántico)")
print(f" Max tokens : {args.max_tokens}")
dedup_info = "deactive" if args.no_dedup else \
f"MinHash LSH threshold={args.dedup_threshold}" + \
(f" RAM")
print(f" Dedup : {dedup_info}")
print()
paths = fetch_documents(args.docs_path, cfg, [".md"])
if not paths:
print("No files found.")
return
print(f"{len(paths)} files found\n")
deduplicator = None
if not args.no_dedup and MINHASH_AVAILABLE:
deduplicator = StreamingDeduplicator(
threshold=args.dedup_threshold,
)
writer = JsonlWriter(args.output)
try:
stats, warnings = run_pipeline(
paths, args.lang_config, writer, deduplicator,
overlap, args.max_tokens, args.workers
)
finally:
writer.close()
print_report(stats, warnings, str(writer.path),
TOKEN_BACKEND, args.workers, cfg.language)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,74 @@
{
"_comment": "Configuración del lenguaje AVAP para el chunker genérico. Basada en el LRM (Language Reference Manual) de AVAP.",
"language": "avap",
"version": "1.0",
"file_extensions": [".avap"],
"lexer": {
"string_delimiters": ["\"", "'"],
"escape_char": "\\",
"comment_line": ["///", "//"],
"comment_block": { "open": "/*", "close": "*/" },
"line_oriented": true
},
"blocks": [
{
"name": "function",
"doc_type": "code",
"opener_pattern": "^\\s*function\\s+(\\w+)\\s*\\(([^)]*)",
"closer_pattern": "^\\s*\\}\\s*$",
"extract_signature": true,
"signature_template": "function {group1}({group2})"
},
{
"name": "if",
"doc_type": "code",
"opener_pattern": "^\\s*if\\s*\\(",
"closer_pattern": "^\\s*end\\s*\\(\\s*\\)"
},
{
"name": "startLoop",
"doc_type": "code",
"opener_pattern": "^\\s*startLoop\\s*\\(",
"closer_pattern": "^\\s*endLoop\\s*\\(\\s*\\)"
},
{
"name": "try",
"doc_type": "code",
"opener_pattern": "^\\s*try\\s*\\(\\s*\\)",
"closer_pattern": "^\\s*end\\s*\\(\\s*\\)"
}
],
"statements": [
{ "name": "registerEndpoint", "pattern": "^\\s*registerEndpoint\\s*\\(" },
{ "name": "addVar", "pattern": "^\\s*addVar\\s*\\(" },
{ "name": "io_command", "pattern": "^\\s*(addParam|getListLen|addResult|getQueryParamList)\\s*\\(" },
{ "name": "http_command", "pattern": "^\\s*(RequestPost|RequestGet)\\s*\\(" },
{ "name": "orm_command", "pattern": "^\\s*(ormDirect|ormCheckTable|ormCreateTable|ormAccessSelect|ormAccessInsert|ormAccessUpdate)\\s*\\(" },
{ "name": "util_command", "pattern": "^\\s*(variableToList|itemFromList|variableFromJSON|AddVariableToJSON|encodeSHA256|encodeMD5|getRegex|getDateTime|stampToDatetime|getTimeStamp|randomString|replace)\\s*\\(" },
{ "name": "async_command", "pattern": "^\\s*(\\w+\\s*=\\s*go\\s+|gather\\s*\\()" },
{ "name": "connector", "pattern": "^\\s*\\w+\\s*=\\s*avapConnector\\s*\\(" },
{ "name": "modularity", "pattern": "^\\s*(import|include)\\s+" },
{ "name": "assignment", "pattern": "^\\s*\\w+\\s*=\\s*" }
],
"semantic_tags": [
{ "tag": "uses_orm", "pattern": "\\b(ormDirect|ormCheckTable|ormCreateTable|ormAccessSelect|ormAccessInsert|ormAccessUpdate)\\s*\\(" },
{ "tag": "uses_http", "pattern": "\\b(RequestPost|RequestGet)\\s*\\(" },
{ "tag": "uses_connector", "pattern": "\\bavapConnector\\s*\\(" },
{ "tag": "uses_async", "pattern": "\\bgo\\s+\\w+\\s*\\(|\\bgather\\s*\\(" },
{ "tag": "uses_crypto", "pattern": "\\b(encodeSHA256|encodeMD5)\\s*\\(" },
{ "tag": "uses_auth", "pattern": "\\b(addParam|_status)\\b" },
{ "tag": "uses_error_handling", "pattern": "\\btry\\s*\\(\\s*\\)" },
{ "tag": "uses_loop", "pattern": "\\bstartLoop\\s*\\(" },
{ "tag": "uses_json", "pattern": "\\b(variableFromJSON|AddVariableToJSON)\\s*\\(" },
{ "tag": "uses_list", "pattern": "\\b(variableToList|itemFromList|getListLen)\\s*\\(" },
{ "tag": "uses_regex", "pattern": "\\bgetRegex\\s*\\(" },
{ "tag": "uses_datetime", "pattern": "\\b(getDateTime|getTimeStamp|stampToDatetime)\\s*\\(" },
{ "tag": "returns_result", "pattern": "\\baddResult\\s*\\(" },
{ "tag": "registers_endpoint", "pattern": "\\bregisterEndpoint\\s*\\(" }
]
}

View File

@ -0,0 +1,452 @@
"""
avap_ingest.py v2.0
Uso:
# Ingestar
python avap_ingest.py --chunks ingestion/chunks.jsonl --index avap-knowledge-v1
# Borrar indice y re-ingestar desde cero
python avap_ingest.py --chunks ingestion/chunks.jsonl --index avap-knowledge-v1 --delete
# Reprocesar solo los fallidos (DLQ)
python avap_ingest.py --chunks ingestion/failed_chunks.jsonl --index avap-knowledge-v1
"""
import os
import json
import time
import asyncio
import argparse
import traceback
from pathlib import Path
from datetime import datetime
from typing import AsyncGenerator
from elasticsearch import AsyncElasticsearch
import httpx
from tqdm import tqdm
from elasticsearch import helpers as es_helpers
DEFAULT_CHUNKS_PATH = "ingestion/chunks.jsonl"
DEFAULT_INDEX = "avap-knowledge-v1"
DEFAULT_OLLAMA_URL= "http://localhost:11434"
DEFAULT_OLLAMA_MODEL= "qwen3-0.6B-emb:latest"
DEFAULT_EMBEDDING_DIM= 1024
BATCH_SIZE_EMBED= 8
BATCH_SIZE_ES= 50
QUEUE_MAXSIZE= 5
MAX_RETRIES= 3
RETRY_DELAY= 2.0
OLLAMA_TIMEOUT= 120
def iter_chunks_jsonl(path, batch_size):
batch = []
with open(path, encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
chunk = json.loads(line)
batch.append(chunk)
if len(batch) >= batch_size:
yield batch
batch = []
except json.JSONDecodeError as e:
print(e)
if batch:
yield batch
def count_lines(path):
n = 0
with open(path, encoding="utf-8") as f:
for line in f:
if line.strip():
n += 1
return n
def build_index_mapping(embedding_dim):
return {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"avap_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stop"]
}
}
}
},
"mappings": {
"properties": {
"chunk_id": {"type": "keyword"},
"content": {
"type": "text",
"analyzer": "avap_analyzer"
},
"embedding": {
"type": "dense_vector",
"dims": embedding_dim,
"index": True,
"similarity": "cosine",
"index_options": {
"type": "int8_hnsw",
"m": 16,
"ef_construction": 100
}
},
"doc_type": {"type": "keyword"},
"block_type": {"type": "keyword"},
"section": {
"type": "text",
"fields": {"keyword": {"type": "keyword"}}
},
"source_file": {"type": "keyword"},
"start_line": {"type": "integer"},
"end_line": {"type": "integer"},
"token_estimate": {"type": "integer"},
"metadata": {
"properties": {
"uses_orm": {"type": "boolean"},
"uses_http": {"type": "boolean"},
"uses_connector": {"type": "boolean"},
"uses_async": {"type": "boolean"},
"uses_crypto": {"type": "boolean"},
"uses_auth": {"type": "boolean"},
"uses_error_handling": {"type": "boolean"},
"uses_loop": {"type": "boolean"},
"uses_json": {"type": "boolean"},
"uses_list": {"type": "boolean"},
"uses_regex": {"type": "boolean"},
"uses_datetime": {"type": "boolean"},
"returns_result": {"type": "boolean"},
"registers_endpoint": {"type": "boolean"},
"has_overlap": {"type": "boolean"},
"complexity": {"type": "integer"},
"full_block_start": {"type": "integer"},
"full_block_end": {"type": "integer"},
}
}
}
}
}
class DeadLetterQueue:
def __init__(self, base_path = "ingestion"):
ts = datetime.now().strftime("%Y%m%d_%H%M%S")
self.path = Path(base_path) / f"failed_chunks_{ts}.jsonl"
self._handle = None
self.count = 0
def _open(self):
if self._handle is None:
self.path.parent.mkdir(parents=True, exist_ok=True)
self._handle = open(self.path, "w", encoding="utf-8")
def write(self, chunk, reason) -> None:
self._open()
record = {"reason": reason, "chunk": chunk}
self._handle.write(json.dumps(record, ensure_ascii=False) + "\n")
self._handle.flush()
self.count += 1
def close(self):
if self._handle:
self._handle.close()
self._handle = None
def report(self):
if self.count:
print(f"{self.count} Failed: {self.path}")
else:
print(" No failed chunks")
class OllamaAsyncEmbedder:
def __init__(self, base_url, model, timeout = OLLAMA_TIMEOUT):
self.base_url = base_url.rstrip("/")
self.model = model
self._client = httpx.AsyncClient(timeout=timeout)
async def probe_dimension(self):
vecs = await self._embed(["dimension probe"])
return len(vecs[0])
async def _embed(self, texts):
payload = {"model": self.model, "input": texts}
for attempt in range(1, MAX_RETRIES + 1):
try:
resp = await self._client.post(
f"{self.base_url}/api/embed",
json=payload
)
resp.raise_for_status()
return resp.json()["embeddings"]
except Exception as exc:
if attempt >= MAX_RETRIES:
raise RuntimeError(f"Embeddings fail {MAX_RETRIES}: {exc}") from exc
await asyncio.sleep(RETRY_DELAY * attempt)
return []
async def embed_batch(self, chunks, dlq):
texts = [c["content"] for c in chunks]
try:
vectors = await self._embed(texts)
return list(zip(chunks, vectors))
except Exception as exc:
print(exc)
results = []
for chunk in chunks:
try:
vecs = await self._embed([chunk["content"]])
results.append((chunk, vecs[0]))
except Exception as single_exc:
dlq.write(chunk, f"Ollama embed failed: {single_exc}")
return results
async def close(self):
await self._client.aclose()
async def producer(chunks_path, embedder, queue, dlq, batch_size, pbar):
for batch in iter_chunks_jsonl(chunks_path, batch_size):
embedded = await embedder.embed_batch(batch, dlq)
if embedded:
await queue.put(embedded)
pbar.update(len(batch))
await queue.put(None)
async def consumer( queue, es_client, index, dlq, batch_size_es, stats):
buffer: list[tuple[dict, list[float]]] = []
async def flush_buffer():
if not buffer:
return
actions = [
{
"_index": index,
"_id": chunk["chunk_id"],
"_source": {
"chunk_id": chunk["chunk_id"],
"content": chunk["content"],
"embedding": vector,
"doc_type": chunk.get("doc_type", "unknown"),
"block_type": chunk.get("block_type", ""),
"section": chunk.get("section", ""),
"source_file": chunk.get("source_file", ""),
"start_line": chunk.get("start_line", 0),
"end_line": chunk.get("end_line", 0),
"token_estimate": chunk.get("token_estimate", 0),
"metadata": chunk.get("metadata", {}),
}
}
for chunk, vector in buffer
]
try:
ok, errors = await es_helpers.async_bulk(
es_client, actions,
raise_on_error=False,
stats_only=False
)
stats["ok"] += ok
stats["errors"] += len(errors)
for err in errors:
failed_id = err.get("index", {}).get("_id", "unknown")
reason = str(err.get("index", {}).get("error", "unknown ES error"))
for chunk, _ in buffer:
if chunk["chunk_id"] == failed_id:
dlq.write(chunk, f"ES bulk error: {reason}")
break
except Exception as exc:
for chunk, _ in buffer:
dlq.write(chunk, f"ES bulk exception: {exc}")
stats["errors"] += len(buffer)
buffer.clear()
while True:
item = await queue.get()
if item is None:
await flush_buffer()
break
buffer.extend(item)
if len(buffer) >= batch_size_es:
await flush_buffer()
async def build_es_client():
url = "http://127.0.0.1:9200"
client = AsyncElasticsearch(
url,
verify_certs=False,
request_timeout=60
)
try:
info = await client.info()
print(f" Elasticsearch {info['version']['number']} en {url}")
except Exception as e:
raise ConnectionError(f"Cant connet {url}. Error: {e}")
return client
async def create_index(client: AsyncElasticsearch, index: str,
embedding_dim: int,
delete_if_exists: bool = False) -> None:
exists = await client.indices.exists(index=index)
if exists and delete_if_exists:
await client.indices.delete(index=index)
exists = False
if not exists:
await client.indices.create(index=index, body=build_index_mapping(embedding_dim))
print(f" · Index '{index}' created (dim={embedding_dim}, int8_hnsw, cosine).")
else:
print(f" · Inex '{index}' reused.")
"""
async def build_es_client():
url = "http://127.0.0.1:9200"
client = AsyncElasticsearch(
url,
verify_certs=False,
request_timeout=60,
headers={
"Accept": "application/vnd.elasticsearch+json; compatible-with=8",
"Content-Type": "application/json"
}
)
client.options(headers={"Accept": "application/vnd.elasticsearch+json; compatible-with=8"})
try:
await client.info()
except Exception as e:
raise ConnectionError(f"Error de versión/compatibilidad: {e}")
return client
"""
async def run(args):
ollama_url = os.environ.get("OLLAMA_URL", DEFAULT_OLLAMA_URL)
ollama_model = os.environ.get("OLLAMA_MODEL", DEFAULT_OLLAMA_MODEL)
embed_dim = int(os.environ.get("OLLAMA_EMBEDDING_DIM", DEFAULT_EMBEDDING_DIM))
embedder = OllamaAsyncEmbedder(ollama_url, ollama_model)
if args.probe_dim:
dim = await embedder.probe_dimension()
print(f" Model dimensions: {dim}")
await embedder.close()
return
if not Path(args.chunks).exists():
print(f"File Not Found: {args.chunks}")
await embedder.close()
return
total = count_lines(args.chunks)
print(f" Total Chunks: {total}")
print("\nConnecting to VectorDB...")
es_client = await build_es_client()
print(f"\nGenerating index '{args.index}'...")
await create_index(es_client, args.index, embed_dim,
delete_if_exists=args.delete)
print("\n Checking Model dimmensions...")
actual_dim = await embedder.probe_dimension()
if actual_dim != embed_dim:
print(f" Real dimmension ({actual_dim}) != OLLAMA_EMBEDDING_DIM ({embed_dim})")
await embedder.close()
await es_client.close()
return
print(f" Dimmension: {actual_dim}")
dlq = DeadLetterQueue(base_path=str(Path(args.chunks).parent))
stats = {"ok": 0, "errors": 0}
queue = asyncio.Queue(maxsize=QUEUE_MAXSIZE)
print(f"\nAsync pipeline (Ollama <-> Elasticsearch)...\n")
t0 = time.time()
pbar = tqdm(total=total, desc=" Processing", unit="chunks")
await asyncio.gather(
producer(args.chunks, embedder, queue, dlq,
args.batch_embed, pbar),
consumer(queue, es_client, args.index, dlq,
args.batch_es, stats),
)
pbar.close()
elapsed = time.time() - t0
await embedder.close()
await es_client.close()
dlq.close()
print("RESULT")
print("----------------")
print(f"Chunks : {total}")
print(f" -OK : {stats['ok']}")
print(f" -Errors : {stats['errors']}")
print(f" -Index Name: {args.index}")
print()
dlq.report()
print("----------------")
def main():
parser = argparse.ArgumentParser(
description="AVAP Ingestor"
)
parser.add_argument("--chunks", default=DEFAULT_CHUNKS_PATH,
help=f"JSONL Chunk File (default: {DEFAULT_CHUNKS_PATH})")
parser.add_argument("--index", default=DEFAULT_INDEX,
help=f"Index Name (default: {DEFAULT_INDEX})")
parser.add_argument("--delete", action="store_true",
help="Delete index before send")
parser.add_argument("--probe-dim", action="store_true",
help="Check Model dimmension")
parser.add_argument("--batch-embed", type=int, default=BATCH_SIZE_EMBED,
help=f"Chunks by Ollama call(default: {BATCH_SIZE_EMBED})")
parser.add_argument("--batch-es", type=int, default=BATCH_SIZE_ES,
help=f"Docs by bulk ES (default: {BATCH_SIZE_ES})")
args = parser.parse_args()
print("----------------")
print("AVAP INGESTOR")
print("----------------")
if not args.probe_dim:
print(f" Chunks : {args.chunks}")
print(f" INDEX ES : {args.index}")
print(f" Ollama URL : {os.environ.get('OLLAMA_URL', DEFAULT_OLLAMA_URL)}")
print(f" MODEL : {os.environ.get('OLLAMA_MODEL', DEFAULT_OLLAMA_MODEL)}")
print(f" MODEL DIM : {os.environ.get('OLLAMA_EMBEDDING_DIM', DEFAULT_EMBEDDING_DIM)}")
print()
asyncio.run(run(args))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,105 @@
{"chunk_id": "f2b9f3531de0a901", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Prefacio Arquitectónico", "start_line": 2, "end_line": 4, "content": "**AVAP (Advanced Virtual API Programming) es un DSL (Domain-Specific Language) Turing Completo, diseñado arquitectónicamente para la orquestación segura, concurrente y determinista de microservicios e I/O.** No es un lenguaje de propósito general; su motor híbrido y su gramática estricta están optimizados para el procesamiento rápido de transacciones HTTP, la manipulación de datos en memoria y la persistencia, minimizando los efectos secundarios no deseados.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 115}
{"chunk_id": "5fd5f1e92023b13d", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM)", "start_line": 8, "end_line": 10, "content": "Este documento unifica la arquitectura de memoria, estructuras de control, modularidad, concurrencia asíncrona y la gramática formal (BNF) del lenguaje AVAP. Actúa como la única fuente de verdad (Single Source of Truth) para la implementación del parser, el motor de ejecución y la indexación del sistema RAG.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 80}
{"chunk_id": "77a75c28d0b778bc", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN I: Arquitectura, Memoria y Fundamentos Estructurales", "start_line": 14, "end_line": 14, "content": "Esta sección sienta las bases de cómo AVAP gestiona la lógica de los servicios y la manipulación de datos en memoria. A diferencia de los lenguajes interpretados convencionales, AVAP utiliza un motor de evaluación híbrida que permite combinar comandos declarativos con expresiones dinámicas.", "metadata": {"complexity": 0}, "token_estimate": 72}
{"chunk_id": "5fd81b806e4f5711", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN I: Arquitectura, Memoria y Fundamentos Estructurales > 1.1 Estructura de Archivo y Terminación de Sentencias", "start_line": 18, "end_line": 21, "content": "AVAP es un lenguaje **estrictamente orientado a líneas**. Esta decisión de diseño garantiza que el analizador sintáctico (parser) sea extremadamente rápido y determinista, evitando la ambigüedad que sufren lenguajes que permiten declaraciones en múltiples líneas.\n* Cada instrucción lógica (`statement`) debe completarse en una única línea física de texto.\n* El motor reconoce el salto de línea o retorno de carro (`<EOL>`) como el terminador absoluto de la instrucción.\n* No se admite la partición de una instrucción, obligando al programador a escribir un código secuencial, limpio y fácil de depurar.", "metadata": {"complexity": 0}, "token_estimate": 159}
{"chunk_id": "de6ad8755bd4893c", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN I: Arquitectura, Memoria y Fundamentos Estructurales > 1.2 Registro de Endpoints (registerEndpoint)", "start_line": 24, "end_line": 27, "content": "El comando `registerEndpoint` es la unidad atómica de configuración en AVAP. Actúa como el puente crítico entre la red externa (HTTP) y el código interno.\n* **Mecánica:** Define la ruta URL, el método HTTP permitido (ej. `GET`, `POST`), y la función de entrada principal (Handler).\n* **Seguridad:** El servidor AVAP rechazará automáticamente (con un Error 405) cualquier petición que no coincida con el método especificado.\n* **Middlewares:** Permite inyectar una lista de funciones previas para validar tokens antes de ejecutar el bloque principal.", "metadata": {"complexity": 0}, "token_estimate": 139}
{"chunk_id": "926d88e1a5ac0868", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN I: Arquitectura, Memoria y Fundamentos Estructurales > 1.3 Asignación Dinámica y Referencias (addVar)", "start_line": 30, "end_line": 33, "content": "AVAP permite una sintaxis de asignación directa mediante el símbolo `=`, otorgando flexibilidad bajo un estricto control de contexto.\n* **Evaluación en tiempo real:** Cuando el intérprete lee `variable = expresión`, resuelve cualquier operación matemática o lógica utilizando el motor de evaluación subyacente.\n* **El operador de desreferenciación (`$`):** Cuando se utiliza el comando nativo `addVar(copia, $original)`, el prefijo `$` indica al motor que debe buscar en la tabla de símbolos la variable llamada \"original\" y extraer su valor.\n* **Semántica de addVar:** El comando acepta `addVar(valor, variable)` o `addVar(variable, valor)`. Si ambos argumentos son identificadores, el valor del segundo se asigna al primero. No está permitido usar dos literales como argumentos.", "metadata": {"complexity": 0}, "token_estimate": 201}
{"chunk_id": "5c30935931a47a71", "source_file": "../../../docs/LRM/avap.md", "doc_type": "bnf", "block_type": "bnf", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN I: Arquitectura, Memoria y Fundamentos Estructurales > Especificación BNF (Sección I)", "start_line": 37, "end_line": 80, "content": "```bnf\n<program> ::= ( <line> | <block_comment> )*\n<line> ::= [ <statement> ] [ <line_comment> | <doc_comment> ] <EOL>\n | ( <line_comment> | <doc_comment> ) <EOL>\n<EOL> ::= /* Retorno de carro / Salto de línea (\\n o \\r\\n) */\n\n<statement> ::= <assignment>\n | <method_call_stmt>\n | <function_call_stmt>\n | <function_decl>\n | <return_stmt>\n | <system_command>\n | <io_command>\n | <control_flow>\n | <async_command>\n | <connector_cmd>\n | <db_command>\n | <http_command>\n | <util_command>\n | <modularity_cmd>\n\n<assignment> ::= <identifier> \"=\" <expression>\n\n/* Llamada a función global (sin receptor de objeto) */\n<function_call_stmt> ::= <identifier> \"(\" [<argument_list>] \")\"\n\n/* Llamada a método sobre un objeto conector (con receptor) */\n<method_call_stmt> ::= <identifier> \"=\" <identifier> \".\" <identifier> \"(\" [<argument_list>] \")\"\n\n<system_command> ::= <register_cmd> | <addvar_cmd>\n<register_cmd> ::= \"registerEndpoint(\" <stringliteral> \",\" <stringliteral> \",\" <list_display> \",\" <stringliteral> \",\" <identifier> \",\" <identifier> \")\"\n/* addVar asigna un valor a una variable. Acepta (valor, variable) o (variable, valor).\n Si ambos argumentos son identificadores, el valor del segundo se asigna al primero.\n No está permitido pasar dos literales como argumentos. */\n<addvar_cmd> ::= \"addVar(\" <addvar_arg> \",\" <addvar_arg> \")\"\n<addvar_arg> ::= <identifier> | <literal> | \"$\" <identifier>\n/* Restricción semántica: al menos uno de los dos <addvar_arg> debe ser <identifier> */\n\n<identifier> ::= [a-zA-Z_] [a-zA-Z0-9_]*\n\n/* Variables de sistema reservadas — accesibles y asignables desde cualquier scope:\n _status — código HTTP de respuesta (ej. addVar(_status, 401) o _status = 404) */\n<system_variable> ::= \"_status\"\n```", "metadata": {"uses_auth": true, "registers_endpoint": true, "complexity": 2}, "token_estimate": 511}
{"chunk_id": "d4d70d35c8ec7325", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN I: Arquitectura, Memoria y Fundamentos Estructurales > Especificación BNF (Sección I)", "start_line": 81, "end_line": 81, "content": "---", "metadata": {"complexity": 0}, "token_estimate": 1}
{"chunk_id": "10944d208c9da6f4", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN II: Gestión de Entrada y Salida (I/O)", "start_line": 85, "end_line": 85, "content": "Esta sección describe los mecanismos que AVAP utiliza para la ingesta de datos externos, la validación de la integridad de los parámetros y la construcción del paquete de respuesta HTTP. AVAP no posee comandos de impresión interna (como `print`); toda salida de datos se realiza a través de la interfaz HTTP.", "metadata": {"complexity": 0}, "token_estimate": 79}
{"chunk_id": "c384770f48495e01", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN II: Gestión de Entrada y Salida (I/O) > 2.1 Captura Inteligente de Parámetros (addParam)", "start_line": 89, "end_line": 89, "content": "El comando `addParam(parametro, destino)` inspecciona la petición HTTP en un orden jerárquico estricto: primero en la URL (Query arguments), luego en el JSON Body, y finalmente en el Form Data. Si el parámetro solicitado no existe, la variable de destino se inicializa como `None`.", "metadata": {"uses_auth": true, "complexity": 1}, "token_estimate": 72}
{"chunk_id": "89e708750d7aad10", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN II: Gestión de Entrada y Salida (I/O) > 2.2 Validación y Colecciones (getListLen / getQueryParamList)", "start_line": 92, "end_line": 93, "content": "* **`getListLen(fuente, destino)`**: Actúa como un inspector de volumen. Cuenta cuántos elementos hay en una lista o cadena.\n* **`getQueryParamList(parametro, lista_destino)`**: Empaqueta automáticamente múltiples ocurrencias de un parámetro de URL (ej. `?filtro=A&filtro=B`) en una única estructura de lista.", "metadata": {"uses_list": true, "complexity": 1}, "token_estimate": 84}
{"chunk_id": "6596e38ff8166537", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN II: Gestión de Entrada y Salida (I/O) > 2.3 Construcción de Respuesta (addResult y _status)", "start_line": 96, "end_line": 96, "content": "El comando `addResult(variable)` es el encargado de registrar qué variables formarán parte del cuerpo JSON de la respuesta final. La variable de sistema `_status` permite definir explícitamente el código HTTP de salida tanto mediante asignación directa (`_status = 404`) como mediante `addVar(_status, 401)`.", "metadata": {"uses_auth": true, "returns_result": true, "complexity": 2}, "token_estimate": 72}
{"chunk_id": "0c688f58b62acad3", "source_file": "../../../docs/LRM/avap.md", "doc_type": "bnf", "block_type": "bnf", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN II: Gestión de Entrada y Salida (I/O) > Especificación BNF (Sección II)", "start_line": 100, "end_line": 106, "content": "```bnf\n<io_command> ::= <addparam_cmd> | <getlistlen_cmd> | <addresult_cmd> | <getparamlist_cmd>\n<addparam_cmd> ::= \"addParam(\" <stringliteral> \",\" <identifier> \")\"\n<getlistlen_cmd> ::= \"getListLen(\" <identifier> \",\" <identifier> \")\"\n<getparamlist_cmd> ::= \"getQueryParamList(\" <stringliteral> \",\" <identifier> \")\"\n<addresult_cmd> ::= \"addResult(\" <identifier> \")\"\n```", "metadata": {"uses_auth": true, "uses_list": true, "returns_result": true, "complexity": 3}, "token_estimate": 112}
{"chunk_id": "08e0210bd8a6e9f5", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN III: Lógica de Control y Estructuras de Decisión", "start_line": 111, "end_line": 111, "content": "AVAP utiliza una gramática estructural mixta. Combina la fluidez de las palabras clave para abrir bloques funcionales con la seguridad matemática de cierres estrictos.", "metadata": {"complexity": 0}, "token_estimate": 41}
{"chunk_id": "fd4ab34e1a2e1505", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN III: Lógica de Control y Estructuras de Decisión > 3.1 El Bloque Condicional (if() / else() / end())", "start_line": 115, "end_line": 118, "content": "La estructura `if()` evalúa una expresión lógica o de comparación. Todo bloque condicional requiere un cierre explícito utilizando el comando `end()`.\n\nEl comando `if()` soporta dos modos de invocación:\n* **Modo 1 (comparación estructurada):** `if(variable, valor, comparador)` — evalúa la comparación entre variable y valor usando el operador indicado como string (ej. `\"==\"`, `\">\"`, `\"!=\"`). Los dos primeros argumentos deben ser identificadores simples o literales, nunca expresiones de acceso como `dict['clave']`. Si se necesita comparar un valor extraído de una estructura, debe asignarse primero a una variable.* **Modo 2 (expresión libre):** `if(None, None, expresion_compleja)` — evalúa directamente una expresión booleana compleja proporcionada como string encapsulado entre `.", "metadata": {"complexity": 0}, "token_estimate": 204}
{"chunk_id": "6a4ee7e79b875a95", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN III: Lógica de Control y Estructuras de Decisión > 3.1 El Bloque Condicional (if() / else() / end())", "start_line": 125, "end_line": 133, "content": "El comando `if()` gestiona la lógica condicional mediante dos modos de invocación estrictamente diferenciados. Es imperativo respetar los delimitadores y la posición de los argumentos.\n\n#### Modo 1: Comparación Estructurada (Atómica)\nSe utiliza para comparaciones directas entre dos valores simples.\n* **Sintaxis:** `if(átomo_1, átomo_2, \"operador\")`\n* **Argumentos 1 y 2:** Deben ser identificadores simples (variables) o literales (strings/números). **No se permite el uso de `None` en este modo.**\n* **Argumento 3:** El operador de comparación debe ir obligatoriamente entre **comillas dobles** (`\"==\"`, `\"!=\"`, `\">\"`, `\"<\"`, `\">=\"`, `\"<=\"`).\n* **Restricción:** No se permiten expresiones de acceso (ej. `data.user` o `list[0]`). Estos valores deben asignarse previamente a una variable.\n* **Ejemplo correcto:** `if(reintentos, 5, \"<\")`", "metadata": {"complexity": 0}, "token_estimate": 253}
{"chunk_id": "3986628fd0bade4b", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN III: Lógica de Control y Estructuras de Decisión > 3.1 El Bloque Condicional (if() / else() / end())", "start_line": 125, "end_line": 132, "content": "#### Modo 2: Expresión Libre (Evaluación Compleja)\nSe utiliza para evaluar expresiones lógicas que no encajan en la estructura atómica.\n* **Sintaxis:** `if(None, None, `expresión_compleja`)`\n* **Argumentos 1 y 2:** Deben ser literalmente la palabra `None` (sin comillas).\n* **Argumento 3:** La expresión completa **debe** estar encapsulada entre **acentos graves (backticks)**. Esto permite incluir lógica interna, operadores `and/or` y accesos a estructuras de datos.\n* **Ejemplo correcto:** `if(None, None, `user.id > 10 and email.contains(\"@\")`)`\n\n---", "metadata": {"complexity": 0}, "token_estimate": 168}
{"chunk_id": "fccb6c308f9b078a", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "table", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN III: Lógica de Control y Estructuras de Decisión > Tabla de Validación para el Modelo", "start_line": 146, "end_line": 152, "content": "| Entrada | Estado | Razón |\n| :--- | :--- | :--- |\n| `if(count, 10, \"==\")` | ✅ VÁLIDO | Modo 1: Átomos válidos y operador entre comillas. |\n| `if(None, None, `val > 0`)` | ✅ VÁLIDO | Modo 2: Uso correcto de `None` y backticks. |\n| `if(username, None, \"==\")` | ❌ ERROR | El Modo 1 prohíbe el uso de `None`. Debe usarse el Modo 2. |\n| `if(None, None, \"val > 0\")` | ❌ ERROR | El Modo 2 requiere backticks (`` ` ``), no comillas. |\n| `if(user.id, 10, \"==\")` | ❌ ERROR | El Modo 1 no permite expresiones de acceso (`.`). |", "metadata": {"complexity": 0}, "token_estimate": 207}
{"chunk_id": "e55b937828573e96", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN III: Lógica de Control y Estructuras de Decisión > 3.2 Iteraciones Estrictas y Deterministas (startLoop / endLoop)", "start_line": 155, "end_line": 158, "content": "Para garantizar el determinismo y evitar el colapso de memoria:\n* Los bucles se definen mediante `startLoop(contador, inicio, fin)`. Solo iteran basándose en índices numéricos finitos.\n* El bloque debe cerrarse obligatoriamente con `endLoop()`.\n* La forma de salir anticipadamente es invocando el comando global `return()`.", "metadata": {"uses_loop": true, "complexity": 1}, "token_estimate": 83}
{"chunk_id": "44619b4a5d7c2a6a", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN III: Lógica de Control y Estructuras de Decisión > 3.3 Gestión de Errores en Tiempo de Ejecución (try() / exception() / end())", "start_line": 161, "end_line": 162, "content": "Diseñada para proteger la estabilidad del servidor ante fallos de I/O.\n* Si ocurre un fallo del sistema dentro del bloque `try`, el flujo salta al bloque `exception(variable_error)`, poblando la variable con la traza para facilitar la recuperación del script.", "metadata": {"complexity": 0}, "token_estimate": 64}
{"chunk_id": "a66e1f02c3b2af0e", "source_file": "../../../docs/LRM/avap.md", "doc_type": "bnf", "block_type": "bnf", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN III: Lógica de Control y Estructuras de Decisión > Especificación BNF (Sección III)", "start_line": 166, "end_line": 197, "content": "```bnf\n<control_flow> ::= <if_stmt> | <loop_stmt> | <try_stmt>\n\n<if_stmt> ::= \"if(\" <if_condition> \")\" <EOL>\n <block>\n [ \"else()\" <EOL> <block> ]\n \"end()\" <EOL>\n\n<if_condition> ::= <if_structured> | <if_free_expression>\n\n<if_structured> ::= \"if\" \"(\" <strict_atom> \",\" <strict_atom> \",\" <backtick_string> \")\"\n<if_free_expression> ::= \"if\" \"(\" \"None\" \",\" \"None\" \",\" <backtick_string> \")\"\n\n<strict_atom> ::= <identifier> | <non_null_literal>\n<backtick_string> ::= \"`\" <text_content> \"`\"\n\n<identifier> ::= [a-zA-Z_][a-zA-Z0-9_]*\n<non_null_literal>::= <number> | <string_literal_double_quotes> \n/* Nota: <non_null_literal> NO incluye la palabra \"None\" */\n\n<loop_stmt> ::= \"startLoop(\" <identifier> \",\" <expression> \",\" <expression> \")\" <EOL>\n <block>\n \"endLoop()\" <EOL>\n\n<try_stmt> ::= \"try()\" <EOL>\n <block>\n \"exception(\" <identifier> \")\" <EOL>\n <block>\n \"end()\" <EOL>\n\n<block> ::= <line>*\n```", "metadata": {"uses_error_handling": true, "uses_loop": true, "complexity": 2}, "token_estimate": 309}
{"chunk_id": "11333d2fac62c3e9", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN IV: Concurrencia y Asincronía", "start_line": 202, "end_line": 202, "content": "Implementa un sistema avanzado basado en hilos ligeros (gorutinas), permitiendo que el servidor procese operaciones de E/S largas sin bloquear el hilo principal.", "metadata": {"complexity": 0}, "token_estimate": 40}
{"chunk_id": "73dc6006bf8b2750", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN IV: Concurrencia y Asincronía > 4.1 Comando Lanzador (go)", "start_line": 206, "end_line": 207, "content": "* **Sintaxis:** `identificador = go nombre_funcion(parametros)`.\n* **Mecánica:** Crea un nuevo contexto de ejecución aislado. Devuelve un identificador único que debe guardarse para interactuar con el hilo posteriormente.", "metadata": {"uses_async": true, "complexity": 1}, "token_estimate": 57}
{"chunk_id": "b084cc827e7ad592", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN IV: Concurrencia y Asincronía > 4.2 Comando Sincronizador (gather)", "start_line": 210, "end_line": 211, "content": "* **Sintaxis:** `resultado = gather(identificador, timeout)`.\n* **Mecánica:** Pausa el hilo principal esperando el resultado. Si se supera el `timeout` especificado, cancela la espera y devuelve `None`.", "metadata": {"uses_async": true, "complexity": 1}, "token_estimate": 55}
{"chunk_id": "2fc32e3d5bbee77a", "source_file": "../../../docs/LRM/avap.md", "doc_type": "bnf", "block_type": "bnf", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN IV: Concurrencia y Asincronía > Especificación BNF (Sección IV)", "start_line": 215, "end_line": 219, "content": "```bnf\n<async_command> ::= <go_stmt> | <gather_stmt>\n<go_stmt> ::= <identifier> \"=\" \"go\" <identifier> \"(\" [<argument_list>] \")\"\n<gather_stmt> ::= <identifier> \"=\" \"gather(\" <identifier> [\",\" <expression>] \")\"\n```", "metadata": {"uses_async": true, "complexity": 1}, "token_estimate": 64}
{"chunk_id": "c548654ccca08295", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN V: Conectores de Terceros, Peticiones HTTP y ORM Nativo", "start_line": 224, "end_line": 224, "content": "Agrupa todas las capacidades de interconexión hacia el exterior, permitiendo consumir integraciones de terceros, APIs externas y administrar bases de datos relacionales sin drivers adicionales.", "metadata": {"complexity": 0}, "token_estimate": 42}
{"chunk_id": "89758e089ae5eeae", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN V: Conectores de Terceros, Peticiones HTTP y ORM Nativo > 5.1 Conectores de Terceros (avapConnector)", "start_line": 228, "end_line": 230, "content": "`avapConnector` es el mecanismo de integración con servicios de terceros configurados en la plataforma AVAP. Un conector se registra previamente mediante un UUID único. Al instanciarlo, la variable se convierte en un **objeto proxy** que encapsula credenciales y contexto, exponiendo métodos dinámicos mediante notación de punto.\n\n**Patrón de uso:**", "metadata": {"complexity": 0}, "token_estimate": 88}
{"chunk_id": "3def8aeea87256a1", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "avap", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN V: Conectores de Terceros, Peticiones HTTP y ORM Nativo > 5.1 Conectores de Terceros (avapConnector)", "start_line": 232, "end_line": 242, "content": "```avap\n// 1. Instanciar el conector usando su UUID\nbelvo_connector = avapConnector(\"20908e93260147acb2636967021fbf5d\")\n\n// 2. Invocar métodos dinámicos (resueltos en runtime)\ninstitutions = belvo_connector.list_institutions()\nbalances = belvo_connector.get_balances(link, account_id)\n\n// 3. Resultado tratable como variable estándar\naddResult(balances)\n```", "metadata": {"uses_connector": true, "returns_result": true, "complexity": 2}, "token_estimate": 106}
{"chunk_id": "44548bb3d2f94cc2", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN V: Conectores de Terceros, Peticiones HTTP y ORM Nativo > 5.2 Cliente HTTP Externo (RequestPost / RequestGet)", "start_line": 245, "end_line": 248, "content": "Para evitar hilos bloqueados por latencia de red, AVAP exige un parámetro de **timeout** (en milisegundos). Si se supera, la variable destino recibe `None`.\n\n* **`RequestPost(url, querystring, headers, body, destino, timeout)`**: Ejecuta un POST almacenando la respuesta en `destino`.\n* **`RequestGet(url, querystring, headers, destino, timeout)`**: Ejecuta un GET omitiendo el cuerpo.", "metadata": {"uses_http": true, "complexity": 1}, "token_estimate": 104}
{"chunk_id": "90c517b760b54c04", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN V: Conectores de Terceros, Peticiones HTTP y ORM Nativo > 5.3 Conector de Bases de Datos y ORM", "start_line": 252, "end_line": 261, "content": "AVAP utiliza `avapConnector(\"TOKEN\")` para la hidratación segura de credenciales. Las operaciones se ejecutan sobre una tabla específica definida por el parámetro `tableName`.\n\n* **`ormCheckTable(tableName, varTarget)`**: Verifica la existencia de una tabla en la base de datos conectada.\n* **`ormCreateTable(fields, fieldsType, tableName, varTarget)`**: Comando DDL para creación de tablas.\n* **`ormAccessSelect(fields, tableName, selector, varTarget)`**: Recupera registros. `fields` acepta `*` o lista de campos. El `selector` es la cláusula WHERE (puede estar vacío). Devuelve una lista de diccionarios.\n* **`ormAccessInsert(fieldsValues, tableName, varTarget)`**: Inserción parametrizada de registros en la tabla `tableName`.\n* **`ormAccessUpdate(fields, fieldsValues, tableName, selector, varTarget)`**: Modifica registros existentes. El `selector` es obligatorio para delimitar el alcance del cambio en la tabla `tableName`.\n* **`ormDirect(sentencia, destino)`**: Ejecución de SQL crudo para consultas analíticas complejas.\n\n---", "metadata": {"uses_orm": true, "uses_connector": true, "complexity": 2}, "token_estimate": 266}
{"chunk_id": "8bf39cab443ec928", "source_file": "../../../docs/LRM/avap.md", "doc_type": "bnf", "block_type": "bnf", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN V: Conectores de Terceros, Peticiones HTTP y ORM Nativo > Especificación BNF (Sección V)", "start_line": 268, "end_line": 294, "content": "```bnf\n/* Instanciación de conector de terceros y llamada a sus métodos dinámicos */\n<connector_cmd> ::= <connector_instantiation> | <connector_method_call>\n<connector_instantiation> ::= <identifier> \"=\" \"avapConnector(\" <stringliteral> \")\"\n<connector_method_call> ::= [ <identifier> \"=\" ] <identifier> \".\" <identifier> \"(\" [<argument_list>] \")\"\n\n/* Cliente HTTP con Timeout Obligatorio */\n<http_command> ::= <req_post_cmd> | <req_get_cmd>\n<req_post_cmd> ::= \"RequestPost(\" <expression> \",\" <expression> \",\" <expression> \",\" <expression> \",\" <identifier> \",\" <expression> \")\"\n<req_get_cmd> ::= \"RequestGet(\" <expression> \",\" <expression> \",\" <expression> \",\" <identifier> \",\" <expression> \")\"\n\n/* ORM y Persistencia (Estandarizado con tableName) */\n<db_command> ::= <orm_direct> | <orm_check> | <orm_create> | <orm_select> | <orm_insert> | <orm_update>\n<orm_direct> ::= \"ormDirect(\" <expression> \",\" <identifier> \")\"\n<orm_check> ::= \"ormCheckTable(\" <expression> \",\" <identifier> \")\"\n<orm_create> ::= \"ormCreateTable(\" <expression> \",\" <expression> \",\" <expression> \",\" <identifier> \")\"\n\n/* ormAccessSelect(fields, tableName, selector, varTarget) */\n<orm_select> ::= \"ormAccessSelect(\" <orm_fields> \",\" <expression> \",\" [<expression>] \",\" <identifier> \")\"\n<orm_fields> ::= \"*\" | <expression>\n\n/* ormAccessInsert(fieldsValues, tableName, varTarget) */\n<orm_insert> ::= \"ormAccessInsert(\" <expression> \",\" <expression> \",\" <identifier> \")\"\n\n/* ormAccessUpdate(fields, fieldsValues, tableName, selector, varTarget) */\n<orm_update> ::= \"ormAccessUpdate(\" <expression> \",\" <expression> \",\" <expression> \",\" <expression> \",\" <identifier> \")\"\n```", "metadata": {"uses_orm": true, "uses_http": true, "uses_connector": true, "complexity": 3}, "token_estimate": 438}
{"chunk_id": "1ee459cf710ed983", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "Especificación Técnica Consolidada del Lenguaje AVAP (LRM) > SECCIÓN V: Conectores de Terceros, Peticiones HTTP y ORM Nativo > Especificación BNF (Sección V)", "start_line": 295, "end_line": 297, "content": "> **Nota de implementación:** `<connector_instantiation>` se distingue de `<orm_connector_init>` (ORM) únicamente por contexto semántico: el UUID pasado como argumento determina si el adaptador resuelto es un ORM de base de datos o un proxy de terceros. La gramática los trata de forma idéntica; el motor de ejecución selecciona el adaptador apropiado en runtime.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 93}
{"chunk_id": "9c36aa500a211a01", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos", "start_line": 301, "end_line": 303, "content": "AVAP incluye un set de comandos integrados de alto nivel para manipular tipos complejos (JSON y Listas), tiempos, textos y generar hashes.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 38}
{"chunk_id": "21d7c45e60d98b82", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > 6.1 Manipulación Nativa de Listas y Objetos JSON", "start_line": 307, "end_line": 319, "content": "Para extraer y mutar estructuras complejas, AVAP provee comandos nativos específicos. En AVAP, las listas **no se instancian con literales de array**, sino que se construyen y recorren a través de un conjunto cerrado de comandos especializados:\n\n* **`variableToList(elemento, destino)`**: Fuerza a que una variable escalar se convierta en una estructura iterable de lista de un único elemento. Es el punto de entrada canónico para construir una lista desde cero a partir de un valor existente.\n\n* **`itemFromList(lista_origen, indice, destino)`**: Extrae de forma segura el elemento contenido en la posición `indice` (base 0) de una lista. Equivale a un acceso por índice controlado.\n\n* **`getListLen(lista, destino)`**: Calcula el número total de elementos contenidos en `lista` y almacena el resultado entero en `destino`. Imprescindible para construir bucles de recorrido seguro y para validar listas antes de acceder a sus índices. Se recomienda llamar siempre a `getListLen` antes de `itemFromList` para evitar accesos fuera de rango.\n\n* **`variableFromJSON(json_origen, clave, destino)`**: Parsea un objeto JSON en memoria y extrae el valor correspondiente a la `clave`, almacenándolo en `destino`. El acceso es directo por nombre de propiedad.\n\n* **`AddVariableToJSON(clave, valor, json_destino)`**: Inyecta dinámicamente una nueva propiedad dentro de un objeto JSON existente. Si la clave ya existe, su valor es sobreescrito.\n\n**Patrón de recorrido típico en AVAP:**", "metadata": {"uses_json": true, "uses_list": true, "complexity": 2}, "token_estimate": 384}
{"chunk_id": "dc9304e8db408667", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "avap", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > 6.1 Manipulación Nativa de Listas y Objetos JSON", "start_line": 322, "end_line": 333, "content": "```avap\n// 1. Obtener longitud de la lista\ngetListLen(myList, len)\n\n// 2. Iterar con índice controlado\ni = 0\nwhile (i < len) {\n itemFromList(myList, i, currentItem)\n // ... procesar currentItem ...\n i = i + 1\n}\n```", "metadata": {"uses_list": true, "complexity": 1}, "token_estimate": 75}
{"chunk_id": "356328fd14d2cb9c", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > 6.2 Criptografía y Expresiones Regulares", "start_line": 338, "end_line": 342, "content": "* **`encodeSHA256(origen, destino)`** y **`encodeMD5(origen, destino)`**: Funciones criptográficas que encriptan de forma irreversible un texto. Vitales para el almacenamiento seguro de contraseñas y la verificación de integridad de datos. SHA-256 produce un digest de 64 caracteres hexadecimales y ofrece mayor resistencia criptográfica que MD5 (32 caracteres); se recomienda SHA-256 para nuevos desarrollos.\n\n* **`getRegex(origen, patron, destino)`**: Aplica una Expresión Regular (`patron`) sobre la variable de origen, extrayendo la primera coincidencia exacta encontrada. El patrón sigue la sintaxis estándar compatible con Python `re`.\n\n---", "metadata": {"uses_crypto": true, "uses_regex": true, "complexity": 2}, "token_estimate": 166}
{"chunk_id": "e30e00ffbad9299e", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > 6.3 Transformación de Tiempo y Cadenas > Fechas y Timestamps", "start_line": 348, "end_line": 348, "content": "AVAP provee tres comandos complementarios para cubrir todas las conversiones posibles entre representaciones de tiempo. Los tres soportan formatos de calendario en notación `strftime` de Python y cálculos con `TimeDelta` expresados en segundos (positivo para sumar, negativo para restar):", "metadata": {"complexity": 0}, "token_estimate": 69}
{"chunk_id": "418ad7a6e4e5f85d", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "table", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > 6.3 Transformación de Tiempo y Cadenas > Fechas y Timestamps", "start_line": 351, "end_line": 355, "content": "| Comando | Entrada | Salida |\n|---|---|---|\n| `getTimeStamp(fecha_string, formato, timedelta, destino)` | String de fecha | Epoch (entero) |\n| `stampToDatetime(epoch, formato, timedelta, destino)` | Epoch (entero) | String de fecha |\n| `getDateTime(formato, timedelta, zona_horaria, destino)` | — (ahora mismo) | String de fecha |", "metadata": {"uses_datetime": true, "complexity": 1}, "token_estimate": 93}
{"chunk_id": "069dfe5b704bb29f", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > 6.3 Transformación de Tiempo y Cadenas > Fechas y Timestamps", "start_line": 356, "end_line": 360, "content": "* **`getTimeStamp(fecha_string, formato, timedelta, destino)`**: Convierte un string de fecha legible a su valor Epoch (entero Unix). Útil para almacenar fechas y realizar cálculos aritméticos sobre ellas.\n\n* **`stampToDatetime(epoch, formato, timedelta, destino)`**: Convierte un valor Epoch a un string de fecha con el formato especificado. Útil para presentar timestamps almacenados de forma legible.\n\n* **`getDateTime(formato, timedelta, zona_horaria, destino)`**: Captura la fecha y hora actuales del sistema, aplica el ajuste `timedelta` y las convierte a la `zona_horaria` indicada antes de almacenar el resultado. Acepta cualquier zona horaria reconocida por la librería `pytz` de Python.", "metadata": {"uses_datetime": true, "complexity": 1}, "token_estimate": 179}
{"chunk_id": "70aeed6f69fdb183", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > 6.3 Transformación de Tiempo y Cadenas > Cadenas de Texto", "start_line": 364, "end_line": 368, "content": "* **`randomString(patron, longitud, destino)`**: Genera una cadena aleatoria de `longitud` caracteres cuyos símbolos están restringidos al conjunto definido por `patron` (expresión regular de caracteres). Útil para generar tokens de sesión, contraseñas temporales o identificadores únicos.\n\n* **`replace(origen, patron_busqueda, reemplazo, destino)`**: Localiza todas las ocurrencias de `patron_busqueda` dentro de `origen` y las sustituye por `reemplazo`, almacenando el resultado en `destino`. Facilita el saneamiento y normalización de datos de entrada antes de su procesamiento o almacenamiento.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 155}
{"chunk_id": "8f8da55cbe6dd59d", "source_file": "../../../docs/LRM/avap.md", "doc_type": "bnf", "block_type": "bnf", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > BNF — Gramática Formal de los Comandos de Utilidad", "start_line": 373, "end_line": 407, "content": "```bnf\n<util_command> ::= <json_list_cmd> | <crypto_cmd> | <regex_cmd>\n | <datetime_cmd> | <stamp_cmd> | <string_cmd> | <replace_cmd>\n\n/* Manipulación de listas y JSON */\n<json_list_cmd> ::= \"variableToList(\" <expression> \",\" <identifier> \")\"\n | \"itemFromList(\" <identifier> \",\" <expression> \",\" <identifier> \")\"\n | \"getListLen(\" <identifier> \",\" <identifier> \")\"\n | \"variableFromJSON(\" <identifier> \",\" <expression> \",\" <identifier> \")\"\n | \"AddVariableToJSON(\" <expression> \",\" <expression> \",\" <identifier> \")\"\n\n/* Criptografía */\n<crypto_cmd> ::= \"encodeSHA256(\" <expression> \",\" <identifier> \")\"\n | \"encodeMD5(\" <expression> \",\" <identifier> \")\"\n\n/* Expresiones regulares */\n<regex_cmd> ::= \"getRegex(\" <identifier> \",\" <expression> \",\" <identifier> \")\"\n\n/* Fecha/hora actual -> string */\n<datetime_cmd> ::= \"getDateTime(\" <stringliteral> \",\" <expression> \",\" <stringliteral> \",\" <identifier> \")\"\n/* Argumentos: formato_salida, timedelta, zona_horaria, destino */\n\n/* Conversiones epoch ↔ string */\n<stamp_cmd> ::= \"stampToDatetime(\" <expression> \",\" <stringliteral> \",\" <expression> \",\" <identifier> \")\"\n/* Argumentos: epoch_origen, formato, timedelta, destino */\n | \"getTimeStamp(\" <stringliteral> \",\" <stringliteral> \",\" <expression> \",\" <identifier> \")\"\n/* Argumentos: fecha_string, formato_entrada, timedelta, destino */\n\n/* Cadenas */\n<string_cmd> ::= \"randomString(\" <expression> \",\" <expression> \",\" <identifier> \")\"\n/* Argumentos: patron, longitud, destino */\n\n<replace_cmd> ::= \"replace(\" <identifier> \",\" <stringliteral> \",\" <stringliteral> \",\" <identifier> \")\"\n/* Argumentos: origen, patron_busqueda, reemplazo, destino */\n```", "metadata": {"uses_crypto": true, "uses_json": true, "uses_list": true, "uses_regex": true, "uses_datetime": true, "complexity": 5}, "token_estimate": 443}
{"chunk_id": "c6d44dfa4f20d4ba", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > SECCIÓN VII: Arquitectura de Funciones y Ámbitos (Scopes)", "start_line": 413, "end_line": 414, "content": "Las funciones son recintos herméticos de memoria. Al entrar en una función, AVAP crea un nuevo diccionario de variables locales aislado del contexto global.\nEl comando `return()` actúa como interruptor de flujo: inyecta el valor calculado al llamador, libera la memoria local, y si se usa dentro de un `startLoop`, rompe la iteración anticipadamente.", "metadata": {"complexity": 0}, "token_estimate": 87}
{"chunk_id": "3eeaf5913a0be091", "source_file": "../../../docs/LRM/avap.md", "doc_type": "bnf", "block_type": "bnf", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > SECCIÓN VII: Arquitectura de Funciones y Ámbitos (Scopes) > Especificación BNF (Sección VII)", "start_line": 419, "end_line": 429, "content": "```bnf\n/* Nota: las funciones utilizan llaves {} como delimitadores de bloque por decisión\n arquitectónica explícita, diferenciándose de las estructuras de control (if, loop, try)\n que usan palabras clave de cierre (end(), endLoop()). Ambos patrones coexisten\n en la gramática y el parser los distingue por el token de apertura. */\n<function_decl> ::= \"function\" <identifier> \"(\" [<param_list>] \")\" \"{\" <EOL>\n <block>\n \"}\" <EOL>\n<param_list> ::= <identifier> (\",\" <identifier>)*\n<return_stmt> ::= \"return(\" [<expression>] \")\"\n```", "metadata": {"complexity": 0}, "token_estimate": 158}
{"chunk_id": "a21981b11b385a44", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > SECCIÓN VIII: Modularidad e Inclusiones", "start_line": 434, "end_line": 435, "content": "* **Inclusión Estática (`include`)**: Directiva de preprocesador que pega el contenido de un fichero físico en la línea actual.\n* **Librerías (`import`)**: Carga colecciones de funciones. Corchetes angulares (`import <math>`) para nativas, comillas (`import \"mis_utils\"`) para locales.", "metadata": {"complexity": 0}, "token_estimate": 79}
{"chunk_id": "1cb62ad40dde6e03", "source_file": "../../../docs/LRM/avap.md", "doc_type": "bnf", "block_type": "bnf", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > SECCIÓN VIII: Modularidad e Inclusiones > Especificación BNF (Sección VIII)", "start_line": 440, "end_line": 444, "content": "```bnf\n<modularity_cmd> ::= <include_stmt> | <import_stmt>\n<include_stmt> ::= \"include\" \" \" <stringliteral>\n<import_stmt> ::= \"import\" \" \" ( \"<\" <identifier> \">\" | <stringliteral> )\n```", "metadata": {"complexity": 0}, "token_estimate": 60}
{"chunk_id": "8353b49d77752023", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > SECCIÓN IX: Expresiones y Gramática Léxica Estricta", "start_line": 449, "end_line": 449, "content": "Esta sección es el corazón matemático evaluador de AVAP. Define la jerarquía exacta (Precedencia) y provee soporte nativo para características avanzadas similares a Python.", "metadata": {"complexity": 0}, "token_estimate": 45}
{"chunk_id": "3ef2bf52da198594", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > SECCIÓN IX: Expresiones y Gramática Léxica Estricta > 9.1 Cast de Tipos Explícito", "start_line": 453, "end_line": 453, "content": "AVAP permite conversiones de tipos (Type Casting) en cualquier evaluación utilizando funciones constructoras estándar. Puedes transformar variables dinámicamente usando `int(var)`, `float(var)` o `str(var)`.", "metadata": {"complexity": 0}, "token_estimate": 49}
{"chunk_id": "d64846e65a09ba05", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > SECCIÓN IX: Expresiones y Gramática Léxica Estricta > 9.2 Slicing y Comprensiones (Comprehensions)", "start_line": 456, "end_line": 457, "content": "* **Slicing (Cortes):** Puedes extraer fragmentos de listas o strings utilizando la notación de dos puntos. Ejemplo: `mi_lista[1:4]` (extrae desde el índice 1 hasta el 3).\n* **Comprehensions:** AVAP soporta la construcción rápida de listas mediante iteradores en una sola línea, permitiendo filtrar y mapear colecciones enteras (ej. `[x * 2 for x in valores if x > 0]`).", "metadata": {"complexity": 0}, "token_estimate": 115}
{"chunk_id": "ef984b8dd1da3bf5", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > SECCIÓN IX: Expresiones y Gramática Léxica Estricta > 9.3 Análisis Léxico (Lexer) y Documentación", "start_line": 460, "end_line": 463, "content": "AVAP cuenta con tres niveles de descarte de texto para anotaciones humanas:\n1. **Comentarios de Línea (`//`):** Ignora el texto hasta el salto de línea.\n2. **Comentarios de Bloque (`/* ... */`):** Para aislar bloques enteros multilínea.\n3. **Comentarios de Documentación (`///`):** Utilizados por analizadores de código o IDEs para generar documentación técnica automática (Docstrings) a partir del código fuente.", "metadata": {"complexity": 0}, "token_estimate": 114}
{"chunk_id": "d9dcbc4914a55b0e", "source_file": "../../../docs/LRM/avap.md", "doc_type": "bnf", "block_type": "bnf", "section": "SECCIÓN VI: Utilidades, Criptografía y Manipulación de Datos > SECCIÓN IX: Expresiones y Gramática Léxica Estricta > Especificación BNF (Sección IX)", "start_line": 467, "end_line": 530, "content": "```bnf\n/* Jerarquía de Expresiones (Precedencia de menor a mayor) */\n<expression> ::= <logical_or>\n<logical_or> ::= <logical_and> ( \"or\" <logical_and> )*\n<logical_and> ::= <logical_not> ( \"and\" <logical_not> )*\n<logical_not> ::= \"not\" <logical_not> | <comparison>\n\n<comparison> ::= <arithmetic> ( <comp_op> <arithmetic> )*\n<comp_op> ::= \"==\" | \"!=\" | \"<\" | \">\" | \"<=\" | \">=\" | \"in\" | \"is\"\n\n<arithmetic> ::= <term> ( ( \"+\" | \"-\" ) <term> )*\n<term> ::= <factor> ( ( \"*\" | \"/\" | \"%\" ) <factor> )*\n<factor> ::= ( \"+\" | \"-\" ) <factor> | <power>\n<power> ::= <primary> [ \"**\" <factor> ]\n\n/* Primarios y Átomos (Accesos, Castings, Slicing, Métodos y Funciones)\n La regla <primary> cubre también el acceso a métodos de objetos conector\n (conector.metodo(...)) y el acceso por clave a sus resultados (resultado[\"key\"]) */\n<primary> ::= <atom>\n | <primary> \".\" <identifier>\n | <primary> \"[\" <expression> \"]\"\n | <primary> \"[\" [<expression>] \":\" [<expression>] [\":\" [<expression>]] \"]\"\n | <primary> \"(\" [<argument_list>] \")\"\n\n<atom> ::= <identifier>\n | \"$\" <identifier>\n | <literal>\n | \"(\" <expression> \")\"\n | <list_display>\n | <dict_display>\n\n/* Estructuras de Datos, Comprensiones y Argumentos */\n<list_display> ::= \"[\" [<argument_list>] \"]\"\n | \"[\" <expression> \"for\" <identifier> \"in\" <expression> [<if_clause>] \"]\"\n<if_clause> ::= \"if\" <expression>\n<dict_display> ::= \"{\" [<key_datum_list>] \"}\"\n<key_datum_list> ::= <key_datum> ( \",\" <key_datum> )*\n<key_datum> ::= <expression> \":\" <expression>\n<argument_list> ::= <expression> ( \",\" <expression> )*\n\n/* Tipo numérico unificado */\n<number> ::= <floatnumber> | <integer>\n\n/* Literales (Tipos de Datos Primitivos Soportados) */\n<literal> ::= <stringliteral> | <number> | <boolean> | \"None\"\n<boolean> ::= \"True\" | \"False\"\n<integer> ::= [0-9]+\n<floatnumber> ::= [0-9]+ \".\" [0-9]* | \".\" [0-9]+\n\n/* Cadenas de Texto con soporte de secuencias de escape */\n<stringliteral> ::= \"\\\"\" <text_double> \"\\\"\" | \"'\" <text_single> \"'\"\n<escape_sequence> ::= \"\\\\\" ( \"\\\"\" | \"'\" | \"\\\\\" | \"n\" | \"t\" | \"r\" | \"0\" )\n<text_double> ::= ( [^\"\\\\] | <escape_sequence> )*\n<text_single> ::= ( [^'\\\\] | <escape_sequence> )*\n<identifier_or_string> ::= <identifier> | <stringliteral>\n\n/* Reglas de Comentarios para el Lexer\n El lexer aplica longest-match: /// debe evaluarse ANTES que // */\n<doc_comment> ::= \"///\" <any_text>\n<line_comment> ::= \"//\" <any_text>\n<block_comment> ::= \"/*\" <any_content> \"*/\"\n<any_text> ::= [^\\r\\n]*\n<any_content> ::= /* Cualquier secuencia de caracteres que no contenga la subcadena \"*/\" */\n```", "metadata": {"complexity": 0}, "token_estimate": 833}
{"chunk_id": "1a283ddb2d395d2e", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "APÉNDICE X: Especificación Léxica de AVAP", "start_line": 533, "end_line": 538, "content": "Este apéndice define las reglas del **analizador léxico (lexer)** del lenguaje AVAP. \nEl lexer transforma el código fuente en una secuencia de **tokens**, que posteriormente son consumidos por el parser descrito en la gramática BNF.\n\nEl análisis léxico sigue el principio de **máxima coincidencia (longest match)**: cuando múltiples reglas pueden coincidir con el mismo texto, se selecciona la coincidencia más larga.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 109}
{"chunk_id": "0433456477979413", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.1 Espacios en blanco y separadores", "start_line": 542, "end_line": 542, "content": "Los siguientes caracteres se ignoran excepto cuando forman parte de literales o comentarios.", "metadata": {"complexity": 0}, "token_estimate": 18}
{"chunk_id": "355934ebd06425c5", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "regex", "section": "X.1 Espacios en blanco y separadores", "start_line": 545, "end_line": 548, "content": "```regex\nWHITESPACE ::= [ \\t]+\nEOL ::= \\r\\n | \\n | \\r\n```", "metadata": {"complexity": 0}, "token_estimate": 28}
{"chunk_id": "bad9116e87d54385", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.1 Espacios en blanco y separadores", "start_line": 549, "end_line": 555, "content": "Reglas:\n\n- `WHITESPACE` se ignora\n- `EOL` genera el token **EOL**, que actúa como terminador de sentencia\n- AVAP es un lenguaje **orientado a líneas**, por lo que las sentencias no pueden dividirse en múltiples líneas.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 68}
{"chunk_id": "070a7e4aa025eee8", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.2 Comentarios", "start_line": 559, "end_line": 559, "content": "AVAP soporta tres tipos de comentarios. El lexer aplica longest-match, por lo que `///` debe reconocerse **antes** que `//`.", "metadata": {"complexity": 0}, "token_estimate": 34}
{"chunk_id": "3c0f88dc459e1aa4", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "regex", "section": "X.2 Comentarios > Comentario de documentación (mayor prioridad)", "start_line": 564, "end_line": 566, "content": "```regex\nDOC_COMMENT ::= \"///\"[^\\r\\n]*\n```", "metadata": {"complexity": 0}, "token_estimate": 15}
{"chunk_id": "a4cd287486836c9e", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.2 Comentarios > Comentario de documentación (mayor prioridad)", "start_line": 567, "end_line": 569, "content": "Se utiliza para generar documentación automática o anotaciones de herramientas.\n\nEjemplo:", "metadata": {"complexity": 0}, "token_estimate": 20}
{"chunk_id": "80c6a5349e60c83e", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "avap", "section": "X.2 Comentarios > Comentario de documentación (mayor prioridad)", "start_line": 572, "end_line": 574, "content": "```avap\n/// obtiene el balance del usuario\n```", "metadata": {"complexity": 0}, "token_estimate": 13}
{"chunk_id": "c8c98ea294c04cd2", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "regex", "section": "X.2 Comentarios > Comentario de línea", "start_line": 580, "end_line": 582, "content": "```regex\nLINE_COMMENT ::= \"//\"[^\\r\\n]*\n```", "metadata": {"complexity": 0}, "token_estimate": 14}
{"chunk_id": "0d33a6ce642cacc9", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.2 Comentarios > Comentario de línea", "start_line": 583, "end_line": 583, "content": "Ejemplo:", "metadata": {"complexity": 0}, "token_estimate": 4}
{"chunk_id": "a758685e1a878d59", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "avap", "section": "X.2 Comentarios > Comentario de línea", "start_line": 586, "end_line": 588, "content": "```avap\n// comentario\n```", "metadata": {"complexity": 0}, "token_estimate": 8}
{"chunk_id": "3b04f22536af08a3", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.2 Comentarios > Comentario de línea", "start_line": 589, "end_line": 591, "content": "El texto se ignora hasta el final de la línea.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 13}
{"chunk_id": "06917fa36b322f7b", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "regex", "section": "X.2 Comentarios > Comentario de bloque", "start_line": 596, "end_line": 598, "content": "```regex\nBLOCK_COMMENT ::= \"/*\" .*? \"*/\"\n```", "metadata": {"complexity": 0}, "token_estimate": 15}
{"chunk_id": "7e664fbf890d19ec", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.2 Comentarios > Comentario de bloque", "start_line": 599, "end_line": 601, "content": "Puede abarcar múltiples líneas.\n\nEjemplo:", "metadata": {"complexity": 0}, "token_estimate": 15}
{"chunk_id": "97f6d11951708550", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "avap", "section": "X.2 Comentarios > Comentario de bloque", "start_line": 604, "end_line": 607, "content": "```avap\n/* comentario\n multilinea */\n```", "metadata": {"complexity": 0}, "token_estimate": 12}
{"chunk_id": "96797fa8cdcb0bb8", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.3 Identificadores", "start_line": 612, "end_line": 612, "content": "Los identificadores representan nombres de variables, funciones o parámetros.", "metadata": {"complexity": 0}, "token_estimate": 16}
{"chunk_id": "f08450f4b076af96", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "regex", "section": "X.3 Identificadores", "start_line": 615, "end_line": 617, "content": "```regex\nIDENTIFIER ::= [a-zA-Z_][a-zA-Z0-9_]*\n```", "metadata": {"complexity": 0}, "token_estimate": 21}
{"chunk_id": "137d14d78249e4bc", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.3 Identificadores", "start_line": 618, "end_line": 627, "content": "Ejemplos válidos:\n\n```\nx\nuser_id\nbalanceTotal\n_connector\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 22}
{"chunk_id": "ce83cd4dd8d3f82c", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.4 Palabras reservadas", "start_line": 631, "end_line": 631, "content": "Las siguientes palabras están reservadas y **no pueden utilizarse como identificadores**.", "metadata": {"complexity": 0}, "token_estimate": 18}
{"chunk_id": "f638422514566af0", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.4 Palabras reservadas > Control de flujo", "start_line": 635, "end_line": 644, "content": "```\nif\nelse\nend\nstartLoop\nendLoop\ntry\nexception\nreturn\n```", "metadata": {"complexity": 0}, "token_estimate": 21}
{"chunk_id": "0f324b0730ebd2e2", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.4 Palabras reservadas > Declaración de funciones", "start_line": 648, "end_line": 650, "content": "```\nfunction\n```", "metadata": {"complexity": 0}, "token_estimate": 5}
{"chunk_id": "38b7f5fa2fed4953", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.4 Palabras reservadas > Concurrencia", "start_line": 654, "end_line": 657, "content": "```\ngo\ngather\n```", "metadata": {"complexity": 0}, "token_estimate": 7}
{"chunk_id": "007c79f8edd33043", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.4 Palabras reservadas > Modularidad", "start_line": 661, "end_line": 664, "content": "```\ninclude\nimport\n```", "metadata": {"complexity": 0}, "token_estimate": 7}
{"chunk_id": "6098165b41db2735", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.4 Palabras reservadas > Operadores lógicos", "start_line": 668, "end_line": 674, "content": "```\nand\nor\nnot\nin\nis\n```", "metadata": {"complexity": 0}, "token_estimate": 13}
{"chunk_id": "404ac961095eb856", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.4 Palabras reservadas > Literales", "start_line": 678, "end_line": 684, "content": "```\nTrue\nFalse\nNone\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 11}
{"chunk_id": "2b042127c6d731cc", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.5 Operadores > Asignación", "start_line": 690, "end_line": 700, "content": "```\n=\n```\n\nToken:\n\n```\nASSIGN\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 15}
{"chunk_id": "f204bc217eef6166", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.5 Operadores > Operadores aritméticos", "start_line": 704, "end_line": 728, "content": "```\n+\n-\n*\n/\n%\n**\n```\n\nTokens:\n\n```\nPLUS\nMINUS\nMULT\nDIV\nMOD\nPOWER\n```\n\nRegla importante:\n\n`**` debe evaluarse antes que `*` por la regla de **máxima coincidencia**.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 59}
{"chunk_id": "8e87cc07f2c5bb83", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.5 Operadores > Operadores de comparación", "start_line": 732, "end_line": 752, "content": "```\n==\n!=\n<\n>\n<=\n>=\n```\n\nTokens:\n\n```\nEQ\nNEQ\nLT\nGT\nLTE\nGTE\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 34}
{"chunk_id": "0ad578582af128e3", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.5 Operadores > Operadores lógicos", "start_line": 756, "end_line": 770, "content": "```\nand\nor\nnot\n```\n\nTokens:\n\n```\nAND\nOR\nNOT\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 23}
{"chunk_id": "02aad017b7cc9694", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.6 Delimitadores", "start_line": 774, "end_line": 802, "content": "Los siguientes símbolos delimitan estructuras sintácticas.\n\n```\n(\n)\n[\n]\n{\n}\n,\n.\n:\n```\n\nTokens:\n\n```\nLPAREN\nRPAREN\nLBRACKET\nRBRACKET\nLBRACE\nRBRACE\nCOMMA\nDOT\nCOLON\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 66}
{"chunk_id": "03adc6e15a7f8f56", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "regex", "section": "X.7 Literales > Enteros", "start_line": 809, "end_line": 811, "content": "```regex\nINTEGER ::= [0-9]+\n```", "metadata": {"complexity": 0}, "token_estimate": 12}
{"chunk_id": "858115e1f34bbc73", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.7 Literales > Enteros", "start_line": 812, "end_line": 820, "content": "Ejemplos:\n\n```\n0\n10\n999\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 16}
{"chunk_id": "e210af93c0bf4759", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "regex", "section": "X.7 Literales > Números flotantes", "start_line": 825, "end_line": 827, "content": "```regex\nFLOAT ::= [0-9]+\\.[0-9]* | \\.[0-9]+\n```", "metadata": {"complexity": 0}, "token_estimate": 24}
{"chunk_id": "d1d45c84ab2c0471", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.7 Literales > Números flotantes", "start_line": 828, "end_line": 836, "content": "Ejemplos:\n\n```\n1.0\n3.14\n.5\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 21}
{"chunk_id": "9dff5c20b868241b", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.7 Literales > Strings", "start_line": 840, "end_line": 840, "content": "AVAP soporta cadenas con comillas simples y dobles, con soporte de secuencias de escape.", "metadata": {"complexity": 0}, "token_estimate": 25}
{"chunk_id": "15a41be430604349", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "regex", "section": "X.7 Literales > Strings", "start_line": 843, "end_line": 847, "content": "```regex\nSTRING_DOUBLE ::= \"\\\"\" ( [^\"\\\\] | ESCAPE_SEQ )* \"\\\"\"\nSTRING_SINGLE ::= \"'\" ( [^'\\\\] | ESCAPE_SEQ )* \"'\"\nESCAPE_SEQ ::= \"\\\\\" ( '\"' | \"'\" | \"\\\\\" | \"n\" | \"t\" | \"r\" | \"0\" )\n```", "metadata": {"complexity": 0}, "token_estimate": 67}
{"chunk_id": "d61fe050c059539b", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.7 Literales > Strings", "start_line": 848, "end_line": 856, "content": "Ejemplos:\n\n```\n\"hola\"\n'texto'\n\"https://api.com\"\n```\n\nSecuencias de escape soportadas:", "metadata": {"complexity": 0}, "token_estimate": 30}
{"chunk_id": "4322d659ed25b08a", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "table", "section": "X.7 Literales > Strings", "start_line": 859, "end_line": 867, "content": "| Secuencia | Significado |\n|-----------|-------------------|\n| `\\\"` | Comilla doble |\n| `\\'` | Comilla simple |\n| `\\\\` | Barra invertida |\n| `\\n` | Salto de línea |\n| `\\t` | Tabulación |\n| `\\r` | Retorno de carro |\n| `\\0` | Carácter nulo |", "metadata": {"complexity": 0}, "token_estimate": 97}
{"chunk_id": "09cd03196dee4905", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.7 Literales > Strings", "start_line": 868, "end_line": 870, "content": "> **Nota:** `\\n` dentro de un string es un carácter de datos, no un terminador de sentencia. El EOL físico sigue siendo el único terminador.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 40}
{"chunk_id": "89862fa1c7d6fa31", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.8 Literales booleanos", "start_line": 874, "end_line": 881, "content": "Tokens:\n\n```\nTrue\nFalse\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 11}
{"chunk_id": "a1d6aee149860bef", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.9 Literal nulo", "start_line": 885, "end_line": 891, "content": "Token:\n\n```\nNone\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 9}
{"chunk_id": "7ecd779d33d47d65", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.10 Operador de desreferenciación", "start_line": 895, "end_line": 897, "content": "AVAP permite acceder al valor de una variable utilizando el prefijo `$`.\n\nEjemplo:", "metadata": {"complexity": 0}, "token_estimate": 20}
{"chunk_id": "baa9aa4e3a708822", "source_file": "../../../docs/LRM/avap.md", "doc_type": "code_example", "block_type": "avap", "section": "X.10 Operador de desreferenciación", "start_line": 900, "end_line": 902, "content": "```avap\naddVar(copia, $original)\n```", "metadata": {"complexity": 0}, "token_estimate": 13}
{"chunk_id": "ef3ab4b2960421a7", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.10 Operador de desreferenciación", "start_line": 903, "end_line": 909, "content": "Token:\n\n```\nDEREF ::= $\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 11}
{"chunk_id": "abae8d52cca4f34b", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.11 Orden de precedencia léxica", "start_line": 913, "end_line": 928, "content": "Para evitar ambigüedades, el lexer debe aplicar el principio **longest match first**.\n\nOrden obligatorio:\n\n1. comentarios (`///` antes que `//`, luego `/* */`)\n2. whitespace\n3. palabras reservadas\n4. identificadores\n5. números flotantes\n6. enteros\n7. strings\n8. operadores compuestos (`**`, `==`, `<=`, `>=`, `!=`)\n9. operadores simples\n10. delimitadores\n\n---", "metadata": {"complexity": 0}, "token_estimate": 108}
{"chunk_id": "3b94b2289cc2dfaf", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.12 Separación formal: nivel léxico vs nivel sintáctico", "start_line": 932, "end_line": 942, "content": "```\nNIVEL LÉXICO — produce tokens: IDENTIFIER, INTEGER, FLOAT, STRING,\n operadores, delimitadores, EOL, palabras reservadas.\n\nNIVEL SINTÁCTICO — consume tokens: construye el AST según las reglas BNF\n de las Secciones IIX.\n```\n\nEl Apéndice X cubre el nivel léxico. Las Secciones IIX cubren el nivel sintáctico.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 101}
{"chunk_id": "b17c5e47bf3ab720", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.13 Tokens producidos por el lexer", "start_line": 946, "end_line": 994, "content": "El lexer produce los siguientes tokens:\n\n```\nIDENTIFIER\nINTEGER\nFLOAT\nSTRING\n\nASSIGN\nPLUS\nMINUS\nMULT\nDIV\nMOD\nPOWER\n\nEQ\nNEQ\nLT\nGT\nLTE\nGTE\n\nAND\nOR\nNOT\nIN\nIS\n\nLPAREN\nRPAREN\nLBRACKET\nRBRACKET\nLBRACE\nRBRACE\nCOMMA\nDOT\nCOLON\n\nDEREF\n\nTrue\nFalse\nNone\n\nEOL\n```\n\n---", "metadata": {"complexity": 0}, "token_estimate": 103}
{"chunk_id": "8514bf5ba41b03cd", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "X.14 Elementos ignorados por el lexer", "start_line": 998, "end_line": 1007, "content": "Los siguientes elementos se descartan durante el análisis léxico:\n\n```\nWHITESPACE\nLINE_COMMENT\nDOC_COMMENT\nBLOCK_COMMENT\n```\n\nEstos tokens no son enviados al parser.", "metadata": {"complexity": 0}, "token_estimate": 41}
{"chunk_id": "ea8e42e2603b690b", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "XI.1 Modelo de Memoria y Resolución de Variables", "start_line": 1012, "end_line": 1022, "content": "AVAP utiliza un modelo de memoria basado en **tres tipos de ámbitos (scopes)**:\n\n```\nGlobal Scope\nMain Local Scope\nFunction Scope\n```\n\nCada tipo de ámbito tiene reglas estrictas de visibilidad.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 53}
{"chunk_id": "70c803bfeda2191f", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "XI.1.1 Global Scope", "start_line": 1026, "end_line": 1037, "content": "El **Global Scope** contiene variables declaradas como globales y es accesible desde cualquier parte del programa.\n\nPropiedades:\n\n- existe durante toda la vida del proceso del intérprete\n- es visible desde el flujo principal\n- es visible desde todas las funciones\n- es visible desde goroutines\n\nLas variables globales actúan como **estado compartido del programa**.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 80}
{"chunk_id": "e9d01b03575f1839", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "XI.1.2 Main Local Scope", "start_line": 1041, "end_line": 1061, "content": "El **Main Local Scope** corresponde al flujo de ejecución principal del script, fuera de cualquier función.\n\nEjemplo:\n\n```\nx = 10\ny = 20\n```\n\nEstas variables son **locales del flujo principal**.\n\nReglas:\n\n- son accesibles dentro del flujo principal\n- **no son accesibles desde funciones**\n- **no son accesibles desde goroutines**\n- desaparecen cuando finaliza la ejecución del script\n\nEsto evita dependencias implícitas entre funciones y el flujo principal.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 117}
{"chunk_id": "bc7fa2e899950fd6", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "XI.1.3 Function Scope", "start_line": 1065, "end_line": 1085, "content": "Cada vez que se invoca una función:\n\n```\nfunction nombre(parametros)\n```\n\nel motor crea un **Function Scope independiente**.\n\nEste ámbito contiene:\n\n- parámetros de la función\n- variables creadas dentro de la función\n- resultados intermedios\n\nPropiedades:\n\n- solo es visible dentro de esa función\n- no es visible desde el exterior\n- se destruye cuando la función termina\n\n---", "metadata": {"complexity": 0}, "token_estimate": 91}
{"chunk_id": "0ea3def2da82aee4", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "XI.1.4 Resolución de variables", "start_line": 1089, "end_line": 1100, "content": "La resolución de variables sigue el siguiente orden jerárquico:\n\n```\n1. Function Scope\n2. Global Scope\n```\n\nEl **Main Local Scope no es visible dentro de funciones**.\n\nSi una variable no existe en los scopes visibles, el motor produce un **error de ejecución**.\n\n---", "metadata": {"complexity": 0}, "token_estimate": 64}
{"chunk_id": "0e9ec8d414356b98", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "XI.1.5 Aislamiento entre funciones", "start_line": 1104, "end_line": 1121, "content": "Cada invocación de función crea un **scope independiente**.\n\nEjemplo:\n\n```\nfunction ejemplo()\n{\n x = 10\n}\n```\n\nLa variable `x`:\n\n- solo existe dentro de esa ejecución de la función\n- no es visible desde otras funciones\n- no es visible desde el flujo principal\n\n---", "metadata": {"complexity": 0}, "token_estimate": 71}
{"chunk_id": "c32225df6dfcde1d", "source_file": "../../../docs/LRM/avap.md", "doc_type": "spec", "block_type": "narrative", "section": "XI.1.6 Acceso desde goroutines", "start_line": 1125, "end_line": 1137, "content": "Las goroutines creadas mediante:\n\n```\ngo funcion()\n```\n\nsiguen las mismas reglas de scope que una función normal.\n\nPor lo tanto:\n\n- pueden acceder a **Global Scope**\n- pueden acceder a su propio **Function Scope**\n- **no pueden acceder al Main Local Scope**", "metadata": {"uses_async": true, "complexity": 1}, "token_estimate": 63}

View File

@ -0,0 +1,8 @@
datasketch
tqdm
tiktoken
redis
"elasticsearch<9.0.0"
python-dotenv
tqdm
httpx

View File

@ -0,0 +1,7 @@
import socket
try:
test_sock = socket.create_connection(("127.0.0.1", 9200), timeout=2)
print(" --> DEBUG: ¡El puerto 9200 está abierto para Python!")
test_sock.close()
except Exception as e:
print(f" --> DEBUG: Error de socket puro: {e}")

View File

@ -2,11 +2,11 @@ import json
from copy import deepcopy
from dataclasses import replace
from pathlib import Path
from typing import Any
from typing import Any, Union
from lark import Lark
from chonkie import (
Chunk,
ElasticHandshake,
FileFetcher,
MarkdownChef,
TextChef,
@ -18,7 +18,6 @@ from loguru import logger
from transformers import AutoTokenizer
from scripts.pipelines.tasks.embeddings import OllamaEmbeddings
from scripts.pipelines.wrappers.chonkie_wrappers import ElasticHandshakeWithMetadata
from src.config import settings
@ -100,6 +99,58 @@ def _merge_markdown_document(processed_doc: MarkdownDocument) -> MarkdownDocumen
return fused_processed_doc
class ElasticHandshakeWithMetadata(ElasticHandshake):
"""Extended ElasticHandshake that preserves chunk metadata in Elasticsearch."""
def _create_bulk_actions(self, chunks: list[dict]) -> list[dict[str, Any]]:
"""Generate bulk actions including metadata."""
actions = []
embeddings = self.embedding_model.embed_batch([chunk["chunk"].text for chunk in chunks])
for i, chunk in enumerate(chunks):
source = {
"text": chunk["chunk"].text,
"embedding": embeddings[i],
"start_index": chunk["chunk"].start_index,
"end_index": chunk["chunk"].end_index,
"token_count": chunk["chunk"].token_count,
}
# Include metadata if it exists
if chunk.get("extra_metadata"):
source.update(chunk["extra_metadata"])
actions.append({
"_index": self.index_name,
"_id": self._generate_id(i, chunk["chunk"]),
"_source": source,
})
return actions
def write(self, chunks: Union[Chunk, list[Chunk]]) -> list[dict[str, Any]]:
"""Write the chunks to the Elasticsearch index using the bulk API."""
if isinstance(chunks, Chunk):
chunks = [chunks]
actions = self._create_bulk_actions(chunks)
# Use the bulk helper to efficiently write the documents
from elasticsearch.helpers import bulk
success, errors = bulk(self.client, actions, raise_on_error=False)
if errors:
logger.warning(f"Encountered {len(errors)} errors during bulk indexing.") # type: ignore
# Optionally log the first few errors for debugging
for i, error in enumerate(errors[:5]): # type: ignore
logger.error(f"Error {i + 1}: {error}")
logger.info(f"Chonkie wrote {success} chunks to Elasticsearch index: {self.index_name}")
return actions
def fetch_documents(docs_folder_path: str, docs_extension: list[str]) -> list[Path]:
"""
Fetch files from a folder that match the specified extensions.

View File

@ -4,153 +4,28 @@ from langchain_core.messages import SystemMessage
def get_prompt_mbpp(avap_docs: str) -> SystemMessage:
return SystemMessage(
content=f"""
Eres un agente de IA responsable de generar un conjunto de datos sintético para un lenguaje de programación llamado AVAP (Advanced Virtual API Programming).
You are an AI agent responsible for generating a synthetic dataset for a programming language called AVAP (Advanced Virtual API Programming).
Recibirás un objeto JSON que contiene problemas de programación y sus soluciones escritas en Python.
You will receive a JSON object containing programming problems and their solutions written in Python.
Tu tarea es:
1. Convertir cada problema de programación en Python en un problema equivalente formulado para el lenguaje AVAP.
2. Reemplazar la solución en Python con una implementación correcta escrita en AVAP.
Your task is to:
1. Convert each Python programming problem into an equivalent problem framed for the AVAP language.
2. Replace the Python solution with a correct implementation written in AVAP.
Requisitos:
- La solución en AVAP debe seguir la sintaxis de AVAP.
- Usa bibliotecas y construcciones de AVAP cuando sea apropiado.
- Las funciones en AVAP se declaran usando la palabra clave `function`.
- Preserva la intención y dificultad del problema original.
- No produzcas código Python en la solución final.
Requirements:
- The AVAP solution must follow AVAP syntax.
- Use AVAP libraries and constructs when appropriate.
- Functions in AVAP are declared using the `function` keyword.
- Preserve the original problem intent and difficulty.
- Do not produce Python code in the final solution.
Reglas de uso de documentación:
- DEBES confiar exclusivamente en la documentación de AVAP proporcionada a continuación.
- No inventes sintaxis, funciones o bibliotecas que no estén descritas en la documentación.
- Si la documentación no contiene suficiente información para resolver el problema, devuelve exactamente:
"No sé cómo responder esta pregunta basándome en la documentación proporcionada."
Documentation usage rules:
- You MUST rely exclusively on the AVAP documentation provided below.
- Do not invent syntax, functions, or libraries that are not described in the documentation.
- If the documentation does not contain enough information to solve the problem, return exactly:
"I don't know how to answer this question based on the provided documentation."
Documentación AVAP:
AVAP Documentation:
{avap_docs}
"""
)
def get_prompt_human_eval(avap_docs: str) -> SystemMessage:
return SystemMessage(
content = f"""
Eres un agente de IA responsable de generar un conjunto de datos sintético para un lenguaje de programación llamado AVAP (Advanced Virtual API Programming).
AVAP es un lenguaje cuyo propósito principal es crear APIs. Por lo tanto, cada problema de programación debe transformarse en una tarea orientada a API, no en una función independiente.
Recibirás como entrada un objeto JSON que contiene problemas de programación del conjunto de datos OpenAI HumanEval y sus soluciones de referencia escritas en Python.
Tu tarea es transformar cada ejemplo de entrada en un elemento del conjunto de datos AVAP haciendo lo siguiente:
1. Reescribe la tarea de programación original como una especificación de endpoint API.
2. Reemplaza la solución basada en funciones de Python con una implementación válida de AVAP centrada en la lógica del endpoint.
3. Adapta las pruebas para que validen el comportamiento esperado de la API en AVAP.
4. Preserva la intención del problema original tanto como sea posible, pero exprésalo a través de interacciones de tipo solicitud/respuesta.
Requisitos de salida:
- Tu respuesta DEBE ser un JSON válido.
- Tu respuesta DEBE ser un array JSON.
- Cada elemento del array DEBE seguir exactamente esta estructura:
[
{{
"task_id": 1,
"text": "Crear un endpoint que reciba un parámetro 'message' y devuelva un saludo personalizado. Si no se proporciona el parámetro, debe devolver un saludo genérico con código de estado 200.",
"code": "addParam(\\"message\\", message)\\nif(message, None, \\"=\\")\\n greeting = \\"Hello, World!\\"\\nelse()\\n greeting = \\"Hello, \\" + message + \\"!\\"\\nend()\\naddResult(greeting)\\n_status = 200",
"test_inputs": {{
"message": "Alice"
}},
"test_list": [
"re.search(r'Hello, Alice!', greeting)",
"re.match(r'^200$', str(_status))"
]
}}
]
Significado de los campos:
- "task_id": identificador entero de la tarea original.
- "text": declaración del problema reescrita en español, expresada como una tarea de endpoint API.
- "code": código AVAP válido que implementa el comportamiento del endpoint.
- "test_inputs": objeto con los parámetros de solicitud o datos de carga necesarios para probar el endpoint.
- "test_list": lista de expresiones de validación como cadenas que verifican el resultado de la API, valores devueltos y estado cuando sea apropiado.
Reglas de transformación:
- Reinterpreta cada problema de HumanEval como algo que haría un endpoint API.
- Prefiere entradas como parámetros de solicitud, parámetros de consulta, campos de cuerpo o valores de ruta, según lo que mejor se adapte a la tarea original.
- Prefiere salidas como respuestas API, resultados o campos de estado.
- Si la tarea original de Python devuelve un valor calculado, la versión de AVAP debe exponer ese valor como una respuesta de endpoint.
- Si es útil, incluye manejo de estado como `_status = 200`, pero solo si es compatible con la documentación.
- El endpoint debe reflejar la intención algorítmica original, no solo un envoltorio superficial de tipo saludo.
Restricciones estrictas:
- Usa solo sintaxis de AVAP, construcciones, operadores y bibliotecas describidas explícitamente en la documentación a continuación.
- No inventes características no documentadas.
- No produzcas código Python en la solución.
- No incluyas explicaciones, markdown o ningún texto fuera del array JSON.
- Devuelve solo el array JSON final.
Regla de fracaso:
- Si la documentación de AVAP no contiene suficiente información para producir una solución correcta basada en endpoints, devuelve exactamente:
"No sé cómo responder esta pregunta basándome en la documentación proporcionada."
Documentación AVAP:
{avap_docs}
""")
def get_prompt_generation(
avap_docs: str, num_problems: int = 10, problems_per_category: int = 10
) -> SystemMessage:
return SystemMessage(
content=f"""
Eres un agente de IA responsable de generar un conjunto de datos sintético para un lenguaje de programación llamado AVAP (Advanced Virtual API Programming).
Tu tarea es generar exactamente {num_problems} nuevos problemas de programación en AVAP que demuestren diferentes características del lenguaje.
Requisitos:
- Genera problemas realistas de endpoints API que podrían usarse para evaluación.
- Cubre características y construcciones diversas de AVAP.
- Cada problema debe ser independiente y autónomo.
- Varía el nivel de dificultad: problemas simples, intermedios y avanzados.
- Sigue la sintaxis y semántica de AVAP exactamente como se describe en la documentación.
Formato de salida:
DEBES responder SOLO con un array JSON válido. Sin markdown, sin explicaciones, sin texto fuera del array.
Cada elemento DEBE seguir esta estructura exacta:
{{
"task_id": <entero>,
"text": "<descripción del problema en español, describiendo lo que el endpoint debe hacer>",
"code": "<código AVAP válido implementando el endpoint, con \\n para saltos de línea>",
"test_inputs": {{}},
"test_list": ["<expresión re.match(r'...', var)>", ...]
}}
Descripción de los campos:
- "task_id": entero único comenzando desde 1, consecutivo para cada problema.
- "text": descripción de lo que un endpoint API debe implementar (en español).
- "code": código AVAP completo y válido que implementa el endpoint. Debe usar:
- addParam() para parámetros de entrada
- addResult() para resultados de salida
- _status para códigos de estado HTTP cuando sea apropiado
- Sintaxis válida de AVAP (if/else, bucles, variables, etc.)
- "test_inputs": típicamente vacío {{}} ya que el código generado contiene parámetros en línea, o contiene valores de prueba fijos si es necesario.
- "test_list": lista de expresiones de validación con regex usando formato re.match(r'patrón', variable).
Reglas estrictas para la generación de código:
- Usa SOLO características y sintaxis documentadas en el manual de AVAP a continuación.
- NO inventes sintaxis o funciones no documentadas.
- Cada línea de código representa una instrucción.
- Las variables deben declararse antes de usarlas.
- El estado HTTP debe usar _status = <código>.
- Los valores devueltos deben usar addResult().
- Solo código AVAP puro, nada de Python.
Si no puedes generar un problema válido con la documentación proporcionada, aún genera un problema pero indica en la test_list:
["<documentación insuficiente para validación completa>"]
Documentación AVAP:
{avap_docs}
Ahora genera {num_problems} problemas. Devuelve SOLO el array JSON.
"""
)

View File

@ -0,0 +1,110 @@
import io
import json
from pathlib import Path
import requests
from loguru import logger
def load_tasks(dataset_path: Path) -> list[dict]:
"""Load tasks from a synthetic dataset JSON file."""
with dataset_path.open("r", encoding="utf-8") as f:
tasks: list[dict] = json.load(f)
logger.info(f"Loaded {len(tasks)} tasks from {dataset_path}")
return tasks
def _post_single_task(task: dict, api_url: str, timeout: int) -> dict:
"""Post a single task to the validation API and return the result."""
payload = json.dumps([task]).encode("utf-8")
file_obj = io.BytesIO(payload)
response = requests.post(
api_url,
files={"file": ("task.json", file_obj, "application/json")},
timeout=timeout,
)
return _parse_task_response(response.text)
def _parse_task_response(raw: str) -> dict:
"""Parse the API response for a single task."""
raw = raw.strip()
if not raw:
return {"success": False, "error": "Empty response from API"}
decoder = json.JSONDecoder()
objects: list[dict] = []
idx = 0
while idx < len(raw):
try:
obj, end_idx = decoder.raw_decode(raw, idx)
objects.append(obj)
idx = end_idx
except json.JSONDecodeError:
idx += 1
while idx < len(raw) and raw[idx] in " \t\n\r":
idx += 1
if not objects:
return {"success": False, "error": f"Could not parse response: {raw[:200]}"}
for obj in objects:
if not obj.get("success"):
return obj
if "result_sequence" in obj and obj["result_sequence"]:
return obj["result_sequence"][0]
return objects[0]
def validate_all_tasks(tasks: list[dict], api_url: str, timeout: int) -> list[dict]:
"""Validate each task individually against the API.
Posts tasks one by one so that a failure in one task does not
prevent the rest from being validated.
Args:
tasks: List of task dicts to validate.
api_url: URL of the validation API endpoint.
timeout: Timeout in seconds for each API request.
Returns:
List of tasks that passed validation.
"""
validated: list[dict] = []
errors: list[str] = []
for idx, task in enumerate(tasks):
task_id = task.get("task_id", idx)
try:
result = _post_single_task(task, api_url, timeout)
if result.get("success") and result.get("assertion_result", True):
validated.append(task)
logger.debug(f"Task {task_id}: passed")
else:
msg = f"Task {task_id}: {result}"
errors.append(msg)
logger.warning(msg)
except requests.RequestException as exc:
msg = f"Task {task_id}: Request failed — {exc}"
errors.append(msg)
logger.error(msg)
if errors:
logger.error(
f"\n{'=' * 60}\n"
f"VALIDATION ERROR SUMMARY — {len(errors)} task(s) failed:\n"
+ "\n".join(f" - {e}" for e in errors)
+ f"\n{'=' * 60}"
)
logger.info(f"Validation complete: {len(validated)}/{len(tasks)} tasks passed")
return validated
def save_validated_tasks(tasks: list[dict], output_path: Path) -> None:
"""Write the validated task list to a JSON file."""
output_path.parent.mkdir(parents=True, exist_ok=True)
with output_path.open("w", encoding="utf-8") as f:
json.dump(tasks, f, ensure_ascii=False, indent=2)
logger.info(f"Saved validated dataset to {output_path}")

View File

@ -13,6 +13,7 @@ class Settings(BaseSettings):
interim_path_: Optional[str] = None
kubeconfig_path_: Optional[str] = None
postgres_url: str
parser_url: str
elasticsearch_url: str
elasticsearch_local_url: str
elasticsearch_index: str

View File

@ -0,0 +1,26 @@
[
{
"task_id": 1,
"text": "Crear un endpoint que reciba un parámetro 'message' y devuelva un saludo personalizado. Si no se proporciona el parámetro, debe devolver un saludo genérico con código de estado 200.",
"code": "addParam(\"message\", message)\nif(message, None, \"=\")\n greeting = \"Hello, World!\"\nelse()\n greeting = \"Hello, \" + message + \"!\"\nend()\naddResult(greeting)\n_status = 200",
"test_inputs": {
"message": "Alice"
},
"test_list": [
"re.search(r'Hello, Alice!', greeting)",
"re.match(r'^200$', str(_status))"
]
},
{
"task_id": 2,
"text": "Crear un generador de tokens seguros que tome una contraseña como entrada, genere un hash SHA256 de la misma, y luego cree un token aleatorio de 32 caracteres alfanuméricos. El sistema debe retornar tanto el hash como el token generado.",
"code": "addParam(\"password\", password)\nencodeSHA256(password, hashed_password)\nrandomString(\"[a-zA-Z0-9]\", 32, secure_token)\naddResult(hashed_password)\naddResult(secure_token)",
"test_inputs": {
"password": "mySecretPass123"
},
"test_list": [
"re.match(r'^[a-f0-9]{64}$', hashed_password)",
"re.match(r'^[a-zA-Z0-9]{32}$', secure_token)"
]
}
]

View File

@ -1,95 +1,326 @@
[
{
"task_id": 1,
"text": "Dado un parámetro 'username' recibido por HTTP, genera un hash SHA-256 de ese valor y devuélvelo como resultado. El código debe capturar el parámetro, calcular el hash y registrar el resultado.",
"code": "addParam(\"username\", username)\nencodeSHA256(username, hashed)\naddResult(hashed)",
"text": "Captura el parámetro 'username' de la petición HTTP y devuélvelo como resultado. Si no existe, la variable será None.",
"code": "addParam(\"username\", username)\naddResult(username)",
"test_inputs": {
"username": "admin"
"username": "alice"
},
"test_list": [
"re.match(r'^[a-f0-9]{64}$', hashed)"
"re.match(r'^alice$', str(username))"
]
},
{
"task_id": 2,
"text": "Recibe el parámetro 'email' y establece el código de estado HTTP en 200. Devuelve el email como resultado.",
"code": "addParam(\"email\", email)\naddVar(_status, 200)\naddResult(email)",
"test_inputs": {
"email": "user@example.com"
},
"test_list": [
"re.match(r'^user@example\\.com$', str(email))",
"re.match(r'^200$', str(_status))"
]
},
{
"task_id": 3,
"text": "Genera un token aleatorio de 32 caracteres alfanuméricos usando randomString y devuélvelo como resultado del endpoint.",
"code": "randomString(\"[a-zA-Z0-9]\", 32, token)\naddResult(token)",
"test_inputs": {},
"text": "Recibe el parámetro 'password', genera su hash SHA-256 y devuelve el hash como resultado.",
"code": "addParam(\"password\", password)\nencodeSHA256(password, hashed)\naddResult(hashed)",
"test_inputs": {
"password": "secret123"
},
"test_list": [
"re.match(r'^[a-zA-Z0-9]{32}$', token)"
"re.match(r'^[a-f0-9]{64}$', str(hashed))"
]
},
{
"task_id": 4,
"text": "Recibe un parámetro 'password' por HTTP, calcula su hash MD5 y devuelve el hash. Si el parámetro no fue enviado (es None), establece _status en 400 y devuelve un mensaje de error.",
"code": "addParam(\"password\", password)\nif(None, None, `password is None`)\naddVar(_status, 400)\nerror = \"password requerido\"\naddResult(error)\nelse()\nencodeMD5(password, hashed)\naddResult(hashed)\nend()",
"text": "Recibe el parámetro 'text', reemplaza todos los espacios por guiones bajos y devuelve el resultado.",
"code": "addParam(\"text\", text)\nreplace(text, \" \", \"_\", result)\naddResult(result)",
"test_inputs": {
"password": "mipassword123"
"text": "hello world foo"
},
"test_list": [
"re.match(r'^[a-f0-9]{32}$', hashed)"
"re.match(r'^hello_world_foo$', str(result))"
]
},
{
"task_id": 5,
"text": "Recibe un parámetro 'texto' por HTTP y reemplaza todos los espacios por guiones bajos. Devuelve el texto transformado como resultado.",
"code": "addParam(\"texto\", texto)\nreplace(texto, \" \", \"_\", resultado)\naddResult(resultado)",
"test_inputs": {
"texto": "hola mundo avap"
},
"text": "Genera un token aleatorio de 32 caracteres alfanuméricos y devuélvelo como resultado.",
"code": "randomString(\"[a-zA-Z0-9]\", 32, token)\naddResult(token)",
"test_inputs": {},
"test_list": [
"re.match(r'^hola_mundo_avap$', resultado)"
"re.match(r'^[a-zA-Z0-9]{32}$', str(token))"
]
},
{
"task_id": 6,
"text": "Dada una lista de tres elementos construida con variableToList y AddVariableToJSON, itera sobre ella con startLoop, extrae cada elemento con itemFromList y acumula la longitud de la lista en una variable 'total'. Devuelve 'total'.",
"code": "variableToList(\"a\", myList)\nAddVariableToJSON(\"1\", \"b\", myList)\nAddVariableToJSON(\"2\", \"c\", myList)\ngetListLen(myList, total)\naddResult(total)",
"test_inputs": {},
"text": "Recibe el parámetro 'age'. Si age es mayor que 18, devuelve 'adulto'; de lo contrario devuelve 'menor'.",
"code": "addParam(\"age\", age)\nif(age, 18, \">\")\nresult = \"adulto\"\nelse()\nresult = \"menor\"\nend()\naddResult(result)",
"test_inputs": {
"age": 25
},
"test_list": [
"re.match(r'^\\d+$', str(total))"
"re.match(r'^adulto$', str(result))"
]
},
{
"task_id": 7,
"text": "Obtén la fecha y hora actual en formato 'YYYY-MM-DD' usando getDateTime con zona horaria UTC y sin desplazamiento de tiempo. Devuelve la fecha como resultado.",
"code": "getDateTime(\"%Y-%m-%d\", 0, \"UTC\", fecha_actual)\naddResult(fecha_actual)",
"test_inputs": {},
"text": "Recibe el parámetro 'score'. Si score es igual a 100, establece _status en 200 y result en 'perfecto'; si no, _status en 400 y result en 'incompleto'.",
"code": "addParam(\"score\", score)\nif(score, 100, \"==\")\naddVar(_status, 200)\nresult = \"perfecto\"\nelse()\naddVar(_status, 400)\nresult = \"incompleto\"\nend()\naddResult(result)",
"test_inputs": {
"score": 100
},
"test_list": [
"re.match(r'^\\d{4}-\\d{2}-\\d{2}$', fecha_actual)"
"re.match(r'^perfecto$', str(result))",
"re.match(r'^200$', str(_status))"
]
},
{
"task_id": 8,
"text": "Recibe un parámetro 'edad' por HTTP. Si la edad es mayor o igual a 18, devuelve 'mayor de edad', de lo contrario devuelve 'menor de edad'. Usa if() Modo 1 para la comparación.",
"code": "addParam(\"edad\", edad)\nedad = int(edad)\nif(edad, 18, \">=\")\nresultado = \"mayor de edad\"\nelse()\nresultado = \"menor de edad\"\nend()\naddResult(resultado)",
"test_inputs": {
"edad": "20"
},
"text": "Crea una lista con el elemento 'item1', obtén su longitud y devuelve la longitud como resultado.",
"code": "variableToList(\"item1\", myList)\ngetListLen(myList, length)\naddResult(length)",
"test_inputs": {},
"test_list": [
"re.match(r'^(mayor de edad|menor de edad)$', resultado)"
"re.match(r'^1$', str(length))"
]
},
{
"task_id": 9,
"text": "Define una función 'cuadrado' que recibe un número y devuelve su cuadrado. Llama a la función con el parámetro 'n' recibido por HTTP y devuelve el resultado.",
"code": "function cuadrado(n) {\nresult = n * n\nreturn(result)\n}\naddParam(\"n\", n)\nn = int(n)\ncuadrado_resultado = cuadrado(n)\naddResult(cuadrado_resultado)",
"text": "Recibe el parámetro 'items' como lista de query params, obtén su longitud y devuélvela como resultado.",
"code": "getQueryParamList(\"items\", items)\ngetListLen(items, length)\naddResult(length)",
"test_inputs": {
"n": "7"
"items": [
"a",
"b",
"c"
]
},
"test_list": [
"re.match(r'^49$', str(cuadrado_resultado))"
"re.match(r'^\\d+$', str(length))"
]
},
{
"task_id": 10,
"text": "Recibe un parámetro 'url' por HTTP y realiza una petición GET a esa URL con timeout de 5000ms. Si la respuesta es None (timeout o error), establece _status en 504 y devuelve un mensaje de error. Si hay respuesta, devuélvela.",
"code": "addParam(\"url\", url)\nRequestGet(url, \"\", \"\", respuesta, 5000)\nif(None, None, `respuesta is None`)\naddVar(_status, 504)\nerror = \"timeout o error en la peticion\"\naddResult(error)\nelse()\naddResult(respuesta)\nend()",
"text": "Recibe el parámetro 'data' como JSON, extrae el campo 'name' y devuélvelo como resultado.",
"code": "addParam(\"data\", data)\nvariableFromJSON(data, \"name\", name)\naddResult(name)",
"test_inputs": {
"data": "{\"name\": \"Carlos\", \"age\": 30}"
},
"test_list": [
"re.match(r'^Carlos$', str(name))"
]
},
{
"task_id": 11,
"text": "Crea un objeto JSON vacío, agrega el campo 'status' con valor 'ok' y devuelve el objeto como resultado.",
"code": "info = {}\nAddVariableToJSON(\"status\", \"ok\", info)\naddResult(info)",
"test_inputs": {},
"test_list": [
"re.match(r'.*ok.*', str(info))"
]
},
{
"task_id": 12,
"text": "Recibe el parámetro 'password', genera su hash MD5 y devuelve el hash como resultado.",
"code": "addParam(\"password\", password)\nencodeMD5(password, hashed)\naddResult(hashed)",
"test_inputs": {
"password": "mypassword"
},
"test_list": [
"re.match(r'^[a-f0-9]{32}$', str(hashed))"
]
},
{
"task_id": 13,
"text": "Obtén la fecha y hora actual en formato 'YYYY-MM-DD' en la zona horaria 'UTC' y devuélvela como resultado.",
"code": "getDateTime(\"%Y-%m-%d\", 0, \"UTC\", today)\naddResult(today)",
"test_inputs": {},
"test_list": [
"re.match(r'^\\d{4}-\\d{2}-\\d{2}$', str(today))"
]
},
{
"task_id": 14,
"text": "Recibe el parámetro 'epoch', conviértelo a string de fecha en formato 'YYYY-MM-DD HH:MM:SS' y devuelve el resultado.",
"code": "addParam(\"epoch\", epoch)\nstampToDatetime(epoch, \"%Y-%m-%d %H:%M:%S\", 0, datestr)\naddResult(datestr)",
"test_inputs": {
"epoch": 1700000000
},
"test_list": [
"re.match(r'^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}$', str(datestr))"
]
},
{
"task_id": 15,
"text": "Recibe el parámetro 'date_str' en formato 'YYYY-MM-DD', conviértelo a epoch y devuelve el epoch como resultado.",
"code": "addParam(\"date_str\", date_str)\ngetTimeStamp(date_str, \"%Y-%m-%d\", 0, epoch)\naddResult(epoch)",
"test_inputs": {
"date_str": "2024-01-15"
},
"test_list": [
"re.match(r'^\\d+$', str(epoch))"
]
},
{
"task_id": 16,
"text": "Define una función que recibe un número y devuelve su cuadrado. Llama a la función con el parámetro 'n' y devuelve el resultado.",
"code": "function square(n) {\nresult = n * n\nreturn(result)\n}\naddParam(\"n\", n)\nsquared = square(n)\naddResult(squared)",
"test_inputs": {
"n": 7
},
"test_list": [
"re.match(r'^49$', str(squared))"
]
},
{
"task_id": 17,
"text": "Define una función que recibe dos números y devuelve su suma. Llama a la función con los parámetros 'a' y 'b' y devuelve el resultado.",
"code": "function add(a, b) {\nresult = a + b\nreturn(result)\n}\naddParam(\"a\", a)\naddParam(\"b\", b)\nsum = add(a, b)\naddResult(sum)",
"test_inputs": {
"a": 15,
"b": 27
},
"test_list": [
"re.match(r'^42$', str(sum))"
]
},
{
"task_id": 18,
"text": "Usa un bloque try/exception para intentar dividir el parámetro 'num' entre 0. Si ocurre error, devuelve 'error_division'.",
"code": "addParam(\"num\", num)\ntry()\nresult = num / 0\nexception(err)\nresult = \"error_division\"\nend()\naddResult(result)",
"test_inputs": {
"num": 10
},
"test_list": [
"re.match(r'^error_division$', str(result))"
]
},
{
"task_id": 19,
"text": "Recibe el parámetro 'url', realiza una petición GET a esa URL con timeout de 5000ms y devuelve la respuesta.",
"code": "addParam(\"url\", url)\nRequestGet(url, \"\", \"\", response, 5000)\naddResult(response)",
"test_inputs": {
"url": "https://httpbin.org/get"
},
"test_list": [
"re.match(r'^(timeout o error en la peticion|.+)$', str(respuesta) if respuesta is not None else 'timeout o error en la peticion')"
"re.match(r'^.*$', str(response))"
]
},
{
"task_id": 20,
"text": "Recibe los parámetros 'url' y 'body', realiza una petición POST con timeout de 3000ms y devuelve la respuesta.",
"code": "addParam(\"url\", url)\naddParam(\"body\", body)\nRequestPost(url, \"\", \"\", body, response, 3000)\naddResult(response)",
"test_inputs": {
"url": "https://httpbin.org/post",
"body": "{\"key\":\"value\"}"
},
"test_list": [
"re.match(r'^.*$', str(response))"
]
},
{
"task_id": 21,
"text": "Instancia un conector externo con UUID '20908e93260147acb2636967021fbf5d', llama al método 'get_status' y devuelve el resultado.",
"code": "connector = avapConnector(\"20908e93260147acb2636967021fbf5d\")\nstatus = connector.get_status()\naddResult(status)",
"test_inputs": {},
"test_list": [
"re.match(r'^.*$', str(status))"
]
},
{
"task_id": 22,
"text": "Lanza una función 'fetchData' de forma asíncrona con go, espera el resultado con gather (timeout 2000ms) y devuelve el resultado.",
"code": "function fetchData() {\ndata = \"fetched\"\nreturn(data)\n}\nhandle = go fetchData()\nresult = gather(handle, 2000)\naddResult(result)",
"test_inputs": {},
"test_list": [
"re.match(r'^fetched$', str(result))"
]
},
{
"task_id": 23,
"text": "Recibe el parámetro 'n', itera desde 0 hasta n acumulando la suma y devuelve la suma total.",
"code": "addParam(\"n\", n)\naccum = 0\nstartLoop(i, 0, n)\naccum = accum + i\nendLoop()\naddResult(accum)",
"test_inputs": {
"n": 5
},
"test_list": [
"re.match(r'^10$', str(accum))"
]
},
{
"task_id": 24,
"text": "Recibe el parámetro 'value'. Usando if Modo 2, si value es mayor que 0 y menor que 100, devuelve 'rango_valido'; si no, devuelve 'fuera_de_rango'.",
"code": "addParam(\"value\", value)\nif(None, None, `value > 0 and value < 100`)\nresult = \"rango_valido\"\nelse()\nresult = \"fuera_de_rango\"\nend()\naddResult(result)",
"test_inputs": {
"value": 50
},
"test_list": [
"re.match(r'^rango_valido$', str(result))"
]
},
{
"task_id": 25,
"text": "Realiza una consulta ORM a la tabla 'users' seleccionando todos los campos sin filtro y devuelve los registros.",
"code": "ormAccessSelect(\"*\", \"users\", \"\", records)\naddResult(records)",
"test_inputs": {},
"test_list": [
"re.match(r'^.*$', str(records))"
]
},
{
"task_id": 26,
"text": "Recibe los parámetros 'username' y 'email', inserta un registro en la tabla 'users' y devuelve el resultado de la inserción.",
"code": "addParam(\"username\", username)\naddParam(\"email\", email)\nfields = {\"username\": username, \"email\": email}\normAccessInsert(fields, \"users\", insert_result)\naddResult(insert_result)",
"test_inputs": {
"username": "bob",
"email": "bob@example.com"
},
"test_list": [
"re.match(r'^.*$', str(insert_result))"
]
},
{
"task_id": 27,
"text": "Recibe el parámetro 'user_id', actualiza el campo 'active' a 1 en la tabla 'users' donde id coincide y devuelve el resultado.",
"code": "addParam(\"user_id\", user_id)\nfields = \"active\"\nvalues = {\"active\": 1}\nselector = \"id = \" + str(user_id)\normAccessUpdate(fields, values, \"users\", selector, update_result)\naddResult(update_result)",
"test_inputs": {
"user_id": 42
},
"test_list": [
"re.match(r'^.*$', str(update_result))"
]
},
{
"task_id": 28,
"text": "Importa la librería nativa 'math', calcula el cuadrado de 9 usando una función y devuelve el resultado.",
"code": "import <math>\nfunction calcSquare(x) {\nresult = x * x\nreturn(result)\n}\nresult = calcSquare(9)\naddResult(result)",
"test_inputs": {},
"test_list": [
"re.match(r'^81$', str(result))"
]
},
{
"task_id": 29,
"text": "Recibe el parámetro 'items_json' como JSON con una lista bajo la clave 'items'. Extrae la lista, obtén su longitud y devuelve la longitud.",
"code": "addParam(\"items_json\", items_json)\nvariableFromJSON(items_json, \"items\", items)\ngetListLen(items, length)\naddResult(length)",
"test_inputs": {
"items_json": "{\"items\": [\"x\", \"y\", \"z\"]}"
},
"test_list": [
"re.match(r'^3$', str(length))"
]
},
{
"task_id": 30,
"text": "Recibe el parámetro 'token'. Si el token tiene exactamente 32 caracteres (usando Modo 2), devuelve 'token_valido' con _status 200; si no, devuelve 'token_invalido' con _status 401.",
"code": "addParam(\"token\", token)\nif(None, None, `len(token) == 32`)\nresult = \"token_valido\"\naddVar(_status, 200)\nelse()\nresult = \"token_invalido\"\naddVar(_status, 401)\nend()\naddResult(result)",
"test_inputs": {
"token": "abcdefghijklmnopqrstuvwxyz123456"
},
"test_list": [
"re.match(r'^token_valido$', str(result))",
"re.match(r'^200$', str(_status))"
]
}
]

View File

@ -0,0 +1,198 @@
[
{
"task_id": 1,
"text": "Captura el parámetro 'username' de la petición HTTP y devuélvelo como resultado. Si no existe, la variable será None.",
"code": "addParam(\"username\", username)\naddResult(username)",
"test_inputs": {
"username": "alice"
},
"test_list": [
"re.match(r'^alice$', str(username))"
]
},
{
"task_id": 2,
"text": "Recibe el parámetro 'email' y establece el código de estado HTTP en 200. Devuelve el email como resultado.",
"code": "addParam(\"email\", email)\naddVar(_status, 200)\naddResult(email)",
"test_inputs": {
"email": "user@example.com"
},
"test_list": [
"re.match(r'^user@example\\.com$', str(email))",
"re.match(r'^200$', str(_status))"
]
},
{
"task_id": 3,
"text": "Recibe el parámetro 'password', genera su hash SHA-256 y devuelve el hash como resultado.",
"code": "addParam(\"password\", password)\nencodeSHA256(password, hashed)\naddResult(hashed)",
"test_inputs": {
"password": "secret123"
},
"test_list": [
"re.match(r'^[a-f0-9]{64}$', str(hashed))"
]
},
{
"task_id": 4,
"text": "Recibe el parámetro 'text', reemplaza todos los espacios por guiones bajos y devuelve el resultado.",
"code": "addParam(\"text\", text)\nreplace(text, \" \", \"_\", result)\naddResult(result)",
"test_inputs": {
"text": "hello world foo"
},
"test_list": [
"re.match(r'^hello_world_foo$', str(result))"
]
},
{
"task_id": 5,
"text": "Genera un token aleatorio de 32 caracteres alfanuméricos y devuélvelo como resultado.",
"code": "randomString(\"[a-zA-Z0-9]\", 32, token)\naddResult(token)",
"test_inputs": {},
"test_list": [
"re.match(r'^[a-zA-Z0-9]{32}$', str(token))"
]
},
{
"task_id": 6,
"text": "Recibe el parámetro 'age'. Si age es mayor que 18, devuelve 'adulto'; de lo contrario devuelve 'menor'.",
"code": "addParam(\"age\", age)\nif(age, 18, \">\")\nresult = \"adulto\"\nelse()\nresult = \"menor\"\nend()\naddResult(result)",
"test_inputs": {
"age": 25
},
"test_list": [
"re.match(r'^adulto$', str(result))"
]
},
{
"task_id": 7,
"text": "Recibe el parámetro 'score'. Si score es igual a 100, establece _status en 200 y result en 'perfecto'; si no, _status en 400 y result en 'incompleto'.",
"code": "addParam(\"score\", score)\nif(score, 100, \"==\")\naddVar(_status, 200)\nresult = \"perfecto\"\nelse()\naddVar(_status, 400)\nresult = \"incompleto\"\nend()\naddResult(result)",
"test_inputs": {
"score": 100
},
"test_list": [
"re.match(r'^perfecto$', str(result))",
"re.match(r'^200$', str(_status))"
]
},
{
"task_id": 8,
"text": "Crea una lista con el elemento 'item1', obtén su longitud y devuelve la longitud como resultado.",
"code": "variableToList(\"item1\", myList)\ngetListLen(myList, length)\naddResult(length)",
"test_inputs": {},
"test_list": [
"re.match(r'^1$', str(length))"
]
},
{
"task_id": 10,
"text": "Recibe el parámetro 'data' como JSON, extrae el campo 'name' y devuélvelo como resultado.",
"code": "addParam(\"data\", data)\nvariableFromJSON(data, \"name\", name)\naddResult(name)",
"test_inputs": {
"data": "{\"name\": \"Carlos\", \"age\": 30}"
},
"test_list": [
"re.match(r'^Carlos$', str(name))"
]
},
{
"task_id": 12,
"text": "Recibe el parámetro 'password', genera su hash MD5 y devuelve el hash como resultado.",
"code": "addParam(\"password\", password)\nencodeMD5(password, hashed)\naddResult(hashed)",
"test_inputs": {
"password": "mypassword"
},
"test_list": [
"re.match(r'^[a-f0-9]{32}$', str(hashed))"
]
},
{
"task_id": 14,
"text": "Recibe el parámetro 'epoch', conviértelo a string de fecha en formato 'YYYY-MM-DD HH:MM:SS' y devuelve el resultado.",
"code": "addParam(\"epoch\", epoch)\nstampToDatetime(epoch, \"%Y-%m-%d %H:%M:%S\", 0, datestr)\naddResult(datestr)",
"test_inputs": {
"epoch": 1700000000
},
"test_list": [
"re.match(r'^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}$', str(datestr))"
]
},
{
"task_id": 16,
"text": "Define una función que recibe un número y devuelve su cuadrado. Llama a la función con el parámetro 'n' y devuelve el resultado.",
"code": "function square(n) {\nresult = n * n\nreturn(result)\n}\naddParam(\"n\", n)\nsquared = square(n)\naddResult(squared)",
"test_inputs": {
"n": 7
},
"test_list": [
"re.match(r'^49$', str(squared))"
]
},
{
"task_id": 17,
"text": "Define una función que recibe dos números y devuelve su suma. Llama a la función con los parámetros 'a' y 'b' y devuelve el resultado.",
"code": "function add(a, b) {\nresult = a + b\nreturn(result)\n}\naddParam(\"a\", a)\naddParam(\"b\", b)\nsum = add(a, b)\naddResult(sum)",
"test_inputs": {
"a": 15,
"b": 27
},
"test_list": [
"re.match(r'^42$', str(sum))"
]
},
{
"task_id": 18,
"text": "Usa un bloque try/exception para intentar dividir el parámetro 'num' entre 0. Si ocurre error, devuelve 'error_division'.",
"code": "addParam(\"num\", num)\ntry()\nresult = num / 0\nexception(err)\nresult = \"error_division\"\nend()\naddResult(result)",
"test_inputs": {
"num": 10
},
"test_list": [
"re.match(r'^error_division$', str(result))"
]
},
{
"task_id": 19,
"text": "Recibe el parámetro 'url', realiza una petición GET a esa URL con timeout de 5000ms y devuelve la respuesta.",
"code": "addParam(\"url\", url)\nRequestGet(url, \"\", \"\", response, 5000)\naddResult(response)",
"test_inputs": {
"url": "https://httpbin.org/get"
},
"test_list": [
"re.match(r'^.*$', str(response))"
]
},
{
"task_id": 20,
"text": "Recibe los parámetros 'url' y 'body', realiza una petición POST con timeout de 3000ms y devuelve la respuesta.",
"code": "addParam(\"url\", url)\naddParam(\"body\", body)\nRequestPost(url, \"\", \"\", body, response, 3000)\naddResult(response)",
"test_inputs": {
"url": "https://httpbin.org/post",
"body": "{\"key\":\"value\"}"
},
"test_list": [
"re.match(r'^.*$', str(response))"
]
},
{
"task_id": 28,
"text": "Importa la librería nativa 'math', calcula el cuadrado de 9 usando una función y devuelve el resultado.",
"code": "import <math>\nfunction calcSquare(x) {\nresult = x * x\nreturn(result)\n}\nresult = calcSquare(9)\naddResult(result)",
"test_inputs": {},
"test_list": [
"re.match(r'^81$', str(result))"
]
},
{
"task_id": 29,
"text": "Recibe el parámetro 'items_json' como JSON con una lista bajo la clave 'items'. Extrae la lista, obtén su longitud y devuelve la longitud.",
"code": "addParam(\"items_json\", items_json)\nvariableFromJSON(items_json, \"items\", items)\ngetListLen(items, length)\naddResult(length)",
"test_inputs": {
"items_json": "{\"items\": [\"x\", \"y\", \"z\"]}"
},
"test_list": [
"re.match(r'^3$', str(length))"
]
}
]

13
test.json Normal file
View File

@ -0,0 +1,13 @@
[
{
"task_id": 23,
"text": "Recibe el parámetro 'n', itera desde 0 hasta n acumulando la suma y devuelve la suma total.",
"code": "addParam(\"n\", n)\naccum = 0\nstartLoop(i, 0, n)\naccum = accum + i\nendLoop()\naddResult(accum)",
"test_inputs": {
"n": 5
},
"test_list": [
"re.match(r'^10$', str(accum))"
]
}
]