4.8 KiB
PRD-0001: OpenAI-Compatible HTTP Proxy
Date: 2026-03-18
Status: Implemented
Requested by: Rafael Ruiz (CTO)
Implemented in: PR #58
Related ADR: ADR-0001 (gRPC as primary interface)
Problem
The Brunix Assistance Engine exposes a gRPC interface as its primary API. gRPC is the right choice for performance and type safety in server-to-server communication, but it creates a significant adoption barrier for two categories of consumers:
Existing OpenAI integrations. Any tool or client already configured to call the OpenAI API — VS Code extensions using continue.dev, LiteLLM routers, Open WebUI instances, internal tooling at 101OBEX, Corp — requires code changes to switch to gRPC. The switching cost is non-trivial and creates friction that slows adoption.
Model replacement use case. The core strategic value of the Brunix RAG is that it can replace direct OpenAI API consumption with a locally-hosted, domain-specific assistant that has no per-token cost and no data privacy concerns. This value proposition is only actionable if the replacement is transparent — i.e., the client does not need to change to consume the Brunix RAG instead of OpenAI.
Without a compatibility layer, the Brunix engine cannot serve as a drop-in replacement for OpenAI models. Every potential adopter faces an integration project instead of a configuration change.
Solution
Implement an HTTP server running alongside the gRPC server that exposes:
- The OpenAI Chat Completions API (
/v1/chat/completions) — both streaming and non-streaming - The OpenAI Completions API (
/v1/completions) — legacy support - The OpenAI Models API (
/v1/models) — for compatibility with clients that enumerate available models - The Ollama Chat API (
/api/chat) — NDJSON streaming format - The Ollama Generate API (
/api/generate) — for Ollama-native clients - The Ollama Tags API (
/api/tags) — for clients that list available models - A health endpoint (
/health)
The proxy bridges HTTP → gRPC internally: stream: false routes to AskAgent, stream: true routes to AskAgentStream. The gRPC interface remains the primary interface and is not modified.
Any client that currently points to https://api.openai.com can be reconfigured to point to http://localhost:8000 (or the server's address) with model: brunix and will work without any other change.
Scope
In scope:
- OpenAI-compatible endpoints as listed above
- Ollama-compatible endpoints as listed above
- Routing
stream: falsetoAskAgentandstream: truetoAskAgentStream - Session ID propagation via the
session_idextension field inChatCompletionRequest - Health endpoint
Out of scope:
- OpenAI function calling / tool use
- OpenAI embeddings API (
/v1/embeddings) - OpenAI fine-tuning or moderation APIs
- Authentication / API key validation (handled at infrastructure level)
- Multi-turn conversation reconstruction from the message array (the proxy extracts only the last user message as the query)
Technical implementation
Stack: FastAPI + uvicorn, running on port 8000 inside the same container as the gRPC server.
Concurrency: An asyncio event loop bridges FastAPI's async context with the synchronous gRPC calls via a dedicated ThreadPoolExecutor (configurable via PROXY_THREAD_WORKERS, default 20). This prevents gRPC blocking calls from stalling the async HTTP server.
Streaming: An asyncio.Queue connects the gRPC token stream (produced in a thread) with the FastAPI StreamingResponse (consumed in the async event loop). Tokens are forwarded as SSE events (OpenAI format) or NDJSON (Ollama format) as they arrive from AskAgentStream.
Entry point: entrypoint.sh starts both the gRPC server and the HTTP proxy as parallel processes. If either crashes, the other is terminated — the container fails cleanly rather than entering a partially active state.
Environment variables:
| Variable | Default | Description |
|---|---|---|
BRUNIX_GRPC_TARGET |
localhost:50051 |
gRPC server address |
PROXY_MODEL_ID |
brunix |
Model name returned by /v1/models and /api/tags |
PROXY_THREAD_WORKERS |
20 |
ThreadPoolExecutor size for gRPC calls |
Validation
Functional: Any OpenAI-compatible client (continue.dev, LiteLLM, Open WebUI) can be pointed at http://localhost:8000 with model: brunix and successfully send queries to the Brunix RAG without code changes.
Strategic: The VS Code extension and any 101OBEX, Corp internal tooling currently consuming OpenAI can switch to the Brunix RAG by changing one endpoint URL and one model name. No other changes required.
Impact on existing interfaces
The gRPC interface (AskAgent, AskAgentStream, EvaluateRAG) is unchanged. Existing gRPC clients are not affected. The proxy is additive — it does not replace the gRPC interface, it complements it.