4.8 KiB

Raw Blame History

PRD-0001: OpenAI-Compatible HTTP Proxy

Date: 2026-03-18
Status: Implemented
Requested by: Rafael Ruiz (CTO) Implemented in: PR #58
Related ADR: ADR-0001 (gRPC as primary interface)

Problem

The Brunix Assistance Engine exposes a gRPC interface as its primary API. gRPC is the right choice for performance and type safety in server-to-server communication, but it creates a significant adoption barrier for two categories of consumers:

Existing OpenAI integrations. Any tool or client already configured to call the OpenAI API — VS Code extensions using continue.dev, LiteLLM routers, Open WebUI instances, internal tooling at 101OBEX, Corp — requires code changes to switch to gRPC. The switching cost is non-trivial and creates friction that slows adoption.

Model replacement use case. The core strategic value of the Brunix RAG is that it can replace direct OpenAI API consumption with a locally-hosted, domain-specific assistant that has no per-token cost and no data privacy concerns. This value proposition is only actionable if the replacement is transparent — i.e., the client does not need to change to consume the Brunix RAG instead of OpenAI.

Without a compatibility layer, the Brunix engine cannot serve as a drop-in replacement for OpenAI models. Every potential adopter faces an integration project instead of a configuration change.

Solution

Implement an HTTP server running alongside the gRPC server that exposes:

The OpenAI Chat Completions API (/v1/chat/completions) — both streaming and non-streaming
The OpenAI Completions API (/v1/completions) — legacy support
The OpenAI Models API (/v1/models) — for compatibility with clients that enumerate available models
The Ollama Chat API (/api/chat) — NDJSON streaming format
The Ollama Generate API (/api/generate) — for Ollama-native clients
The Ollama Tags API (/api/tags) — for clients that list available models
A health endpoint (/health)

The proxy bridges HTTP → gRPC internally: stream: false routes to AskAgent, stream: true routes to AskAgentStream. The gRPC interface remains the primary interface and is not modified.

Any client that currently points to https://api.openai.com can be reconfigured to point to http://localhost:8000 (or the server's address) with model: brunix and will work without any other change.

Scope

In scope:

OpenAI-compatible endpoints as listed above
Ollama-compatible endpoints as listed above
Routing stream: false to AskAgent and stream: true to AskAgentStream
Session ID propagation via the session_id extension field in ChatCompletionRequest
Health endpoint

Out of scope:

OpenAI function calling / tool use
OpenAI embeddings API (/v1/embeddings)
OpenAI fine-tuning or moderation APIs
Authentication / API key validation (handled at infrastructure level)
Multi-turn conversation reconstruction from the message array (the proxy extracts only the last user message as the query)

Technical implementation

Stack: FastAPI + uvicorn, running on port 8000 inside the same container as the gRPC server.

Concurrency: An asyncio event loop bridges FastAPI's async context with the synchronous gRPC calls via a dedicated ThreadPoolExecutor (configurable via PROXY_THREAD_WORKERS, default 20). This prevents gRPC blocking calls from stalling the async HTTP server.

Streaming: An asyncio.Queue connects the gRPC token stream (produced in a thread) with the FastAPI StreamingResponse (consumed in the async event loop). Tokens are forwarded as SSE events (OpenAI format) or NDJSON (Ollama format) as they arrive from AskAgentStream.

Entry point: entrypoint.sh starts both the gRPC server and the HTTP proxy as parallel processes. If either crashes, the other is terminated — the container fails cleanly rather than entering a partially active state.

Environment variables:

Variable	Default	Description
`BRUNIX_GRPC_TARGET`	`localhost:50051`	gRPC server address
`PROXY_MODEL_ID`	`brunix`	Model name returned by `/v1/models` and `/api/tags`
`PROXY_THREAD_WORKERS`	`20`	ThreadPoolExecutor size for gRPC calls

Validation

Functional: Any OpenAI-compatible client (continue.dev, LiteLLM, Open WebUI) can be pointed at http://localhost:8000 with model: brunix and successfully send queries to the Brunix RAG without code changes.

Strategic: The VS Code extension and any 101OBEX, Corp internal tooling currently consuming OpenAI can switch to the Brunix RAG by changing one endpoint URL and one model name. No other changes required.

Impact on existing interfaces

The gRPC interface (AskAgent, AskAgentStream, EvaluateRAG) is unchanged. Existing gRPC clients are not affected. The proxy is additive — it does not replace the gRPC interface, it complements it.

4.8 KiB Raw Blame History