assistance-engine/CONTRIBUTING.md

# Contributing to Brunix Assistance Engine

> This document is the single source of truth for all contribution standards in the Brunix Assistance Engine repository. All contributors — regardless of seniority or role — are expected to read, understand, and comply with these guidelines before opening any Pull Request.

---

## Table of Contents

1. [Development Workflow (GitFlow)](#1-development-workflow-gitflow)
2. [Infrastructure Standards](#2-infrastructure-standards)
3. [Repository Standards](#3-repository-standards)
4. [Pull Request Requirements](#4-pull-request-requirements)
5. [Ingestion Files Policy](#5-ingestion-files-policy)
6. [Environment Variables Policy](#6-environment-variables-policy)
7. [Changelog Policy](#7-changelog-policy)
8. [Documentation Policy](#8-documentation-policy)
9. [Architecture Decision Records (ADRs)](#9-architecture-decision-records-adrs)
10. [Product Requirements Documents (PRDs)](#10-product-requirements-documents-prds)
11. [Research & Experiments Policy](#11-research--experiments-policy)
12. [Incident & Blockage Reporting](#12-incident--blockage-reporting)

---

## 1. Development Workflow (GitFlow)

### Branch Strategy

| Branch type | Naming convention | Purpose |
|---|---|---|
| Feature | `*-dev` | Active development — volatile, no CI validation |
| Main | `online` | Production-ready, fully validated |

- **Feature branches** (`*-dev`) are volatile environments. No validation tests or infrastructure deployments are performed on these branches.
- **Official validation** only occurs after a documented Pull Request is merged into `online`.
- **Developer responsibility:** Code must be stable and functional against the authorized environment before a PR is opened. Do not use the PR review process as a debugging step.

---

## 2. Infrastructure Standards

The project provides a validated, shared environment (Devaron Cluster, Vultr) including Ollama, Elasticsearch, and PostgreSQL.

- **Authorized environment only.** The use of parallel, unauthorized infrastructures — external EC2 instances, ad-hoc local setups, non-replicable environments — is strictly prohibited for official development.
- **No siloed environments.** Isolated development creates technical debt and incompatibility risks that directly impact delivery timelines.
- All infrastructure access must be established via the documented `kubectl` port-forward tunnels defined in the [README](./README.md#3-infrastructure-tunnels).

---

## 3. Repository Standards

### IDE Agnosticism

The `online` branch must remain neutral to any individual's development environment. The following **must not** be committed under any circumstance:

- `.devcontainer/`
- `.vscode/`
- Any local IDE or editor configuration files

The `.gitignore` automates exclusion of these artifacts. Ensure your local environment is fully decoupled from the production-ready source code.

### Security & Least Privilege

- Never use `root` as `remoteUser` in any shared dev environment configuration.
- All configurations must comply with the **Principle of Least Privilege**.
- Using root in shared environments introduces unacceptable supply chain risk.

### Docker & Build Context

- All executable code must reside in `/app` within the container.
- The `/workspace` root directory is **deprecated** — do not reference it.
- Every PR must verify the `Dockerfile` context is optimized via `.dockerignore`.

> **PRs that violate these architectural standards will be rejected without review.**

---

## 4. Pull Request Requirements

A PR is not ready for review unless **all applicable items** in the following checklist are satisfied. Reviewers are authorized to close PRs that do not meet these standards and request resubmission.

### PR Checklist

**Code & Environment**
- [ ] Tested locally against the authorized Devaron Cluster (no unauthorized infrastructure used)
- [ ] No IDE or environment configuration files committed (`.vscode`, `.devcontainer`, etc.)
- [ ] No `root` user configurations introduced
- [ ] `Dockerfile` and `.dockerignore` comply with build context standards

**Ingestion Files** *(see [Section 5](#5-ingestion-files-policy))*
- [ ] No ingestion files were added or modified
- [ ] New or modified ingestion files are committed to the repository under `ingestion/` or `data/`

**Environment Variables** *(see [Section 6](#6-environment-variables-policy))*
- [ ] No new environment variables were introduced
- [ ] New environment variables are documented in the `.env` reference table in `README.md`

**Changelog** *(see [Section 7](#7-changelog-policy))*
- [ ] No changelog entry required (internal refactor, comment/typo fix, zero behavioral change)
- [ ] Changelog updated with correct version bump and date

**Documentation** *(see [Section 8](#8-documentation-policy))*
- [ ] No documentation update required (internal change, no impact on setup or API)
- [ ] `README.md` or relevant docs updated to reflect this change
- [ ] If a significant architectural decision was made, an ADR was created in `docs/ADR/`
- [ ] If a new user-facing feature was introduced, a PRD was created in `docs/product/`
- [ ] If an experiment was conducted, results were documented in `research/`

---

## 5. Ingestion Files Policy

All files used to populate the vector knowledge base — source documents, AVAP manuals, structured data, or ingestion scripts — **must be committed to the repository.**

### Rules

- Ingestion files must reside in a dedicated directory (e.g., `ingestion/` or `data/`) within the repository.
- Any PR that introduces new knowledge base content or modifies existing ingestion pipelines must include the corresponding source files.
- Files containing sensitive content that cannot be committed in plain form must be flagged for discussion before proceeding. Encryption, redaction, or a separate private submodule are all valid solutions — committing to an external or local-only location is not.

### Why this matters

The Elasticsearch vector index is only as reliable as the source material that feeds it. Ingestion files that exist only on a local machine or external location cannot be audited, rebuilt, or validated by the team. A knowledge base populated from untracked files is a non-reproducible dependency — and a risk to the entire RAG pipeline.

---

## 6. Environment Variables Policy

This is a critical requirement. **Every environment variable introduced in a PR must be documented before the PR can be merged.**

### Rules

- Any new variable added to the codebase (`.env`, `docker-compose.yaml`, `server.py`, or any config file) must be declared in the `.env` reference table in `README.md`.
- The documentation must include: variable name, purpose, whether it is required or optional, and an example value.
- Variables that contain secrets must use placeholder values (e.g., `your-secret-key-here`) — never commit real values.

### Required format in README.md

```markdown
| Variable | Required | Description | Example |
|---|---|---|---|
| `LANGFUSE_PUBLIC_KEY` | Yes | Langfuse project public key for tracing | `pk-lf-...` |
| `LANGFUSE_SECRET_KEY` | Yes | Langfuse project secret key | `sk-lf-...` |
| `LANGFUSE_HOST`       | Yes | Langfuse server endpoint | `http://45.77.119.180` |
| `NEW_VARIABLE`        | Yes | Description of what it does | `example-value` |
```

### Why this matters

An undocumented environment variable silently breaks the setup for every other developer on the team. It also makes the service non-reproducible, which is a direct violation of the infrastructure standards in Section 2. There are no exceptions to this policy.

---

## 7. Changelog Policy

The `changelog` file tracks all notable changes and follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

### When a changelog entry IS required

| Change type | Label to use |
|---|---|
| New feature or capability | `Added` |
| Change to existing behavior, API, or interface | `Changed` |
| Bug fix | `Fixed` |
| Security patch or security-related change | `Security` |
| Breaking change or deprecation | `Deprecated` / `Removed` |

### When a changelog entry is NOT required

- Typo or comment fixes only
- Internal refactors with zero behavioral or interface change
- Tooling/CI updates with no user-visible impact

**If in doubt, add an entry.**

### Format

New entries go under `[Unreleased]` at the top of the file. When a PR merges, `[Unreleased]` is renamed to the new version with its date:

```
## [Unreleased]

### Added
- LABEL: Description of the new feature or capability.

### Changed
- LABEL: Description of what changed and the rationale.

### Fixed
- LABEL: Description of the bug resolved.
```

Use uppercase short labels for scanability: `ENGINE:`, `API:`, `PROTO:`, `DOCKER:`, `INFRA:`, `SECURITY:`, `ENV:`, `CONFIG:`, `DOCS:`, `FEATURE:`.

---

## 8. Documentation Policy

### When documentation MUST be updated

Update `README.md` (or the relevant doc file) if the PR includes any of the following:

- Changes to project structure (new files, directories, removed components)
- Changes to setup, installation, or environment configuration
- New or modified API endpoints or Protobuf definitions (`brunix.proto`)
- New, modified, or removed environment variables
- Changes to infrastructure tunnels or Kubernetes service names
- New dependencies or updated dependency versions
- Changes to security, access, or repository standards

### When documentation is NOT required

- Internal implementation changes with no impact on setup, usage, or API
- Fixes that do not alter any documented behavior

### Documentation files in this repository

| File | Purpose |
|---|---|
| `README.md` | Setup guide, env vars reference, quick start |
| `CONTRIBUTING.md` | Contribution standards (this file) |
| `SECURITY.md` | Security policy and vulnerability reporting |
| `docs/ARCHITECTURE.md` | Deep technical architecture reference |
| `docs/API_REFERENCE.md` | Complete gRPC API contract and examples |
| `docs/RUNBOOK.md` | Operational playbooks and incident response |
| `docs/AVAP_CHUNKER_CONFIG.md` | `avap_config.json` reference — blocks, statements, semantic tags |
| `docs/ADR/` | Architecture Decision Records |
| `docs/product/` | Product Requirements Documents |
| `research/` | Experiment results, benchmarks, datasets |

> **PRs that change user-facing behavior or setup without updating documentation will be rejected.**

---

## 9. Architecture Decision Records (ADRs)

Architecture Decision Records document **significant technical decisions** — choices that have lasting consequences on the codebase, infrastructure, or development process.

### When to write an ADR

Write an ADR when a PR introduces or changes:

- A fundamental technology choice (communication protocol, storage backend, framework, model)
- A design pattern that other components will follow
- A deliberate trade-off with known consequences
- A decision that future engineers might otherwise reverse without understanding the rationale

### When NOT to write an ADR

- Implementation details within a single module
- Bug fixes
- Dependency version bumps
- Configuration changes
- New user-facing features (use a PRD instead)

### ADR format

ADRs live in `docs/ADR/` and follow this naming convention:

```
ADR-XXXX-short-title.md
```

Where `XXXX` is a zero-padded sequential number (e.g., `ADR-0005-new-decision.md`).

Each ADR must contain:

```markdown
# ADR-XXXX: Title

**Date:** YYYY-MM-DD
**Status:** Proposed | Under Evaluation | Accepted | Deprecated | Superseded by ADR-YYYY
**Deciders:** Names or roles

## Context
What problem are we solving? What forces are at play?

## Decision
What did we decide?

## Rationale
Why this option over alternatives? Include a trade-off analysis.

## Consequences
What are the positive and negative results of this decision?
```

### Existing ADRs

| ADR | Title | Status |
|---|---|---|
| [ADR-0001](docs/ADR/ADR-0001-grpc-primary-interface.md) | gRPC as the Primary Communication Interface | Accepted |
| [ADR-0002](docs/ADR/ADR-0002-two-phase-streaming.md) | Two-Phase Streaming Design for AskAgentStream | Accepted |
| [ADR-0003](docs/ADR/ADR-0003-hybrid-retrieval-rrf.md) | Hybrid Retrieval (BM25 + kNN) with RRF Fusion | Accepted |
| [ADR-0004](docs/ADR/ADR-0004-claude-eval-judge.md) | Claude as the RAGAS Evaluation Judge | Accepted |
| [ADR-0005](docs/ADR/ADR-0005-embedding-model-selection.md) | Embedding Model Selection — BGE-M3 vs Qwen3-Embedding-0.6B | Under Evaluation |

---

## 10. Product Requirements Documents (PRDs)

Product Requirements Documents capture **user-facing features** — what is being built, why it is needed, and how it will be validated. Every feature that modifies the public API, the gRPC contract, or the user experience of any client (VS Code extension, OpenAI-compatible proxy, etc.) requires a PRD before implementation begins.

### When to write a PRD

Write a PRD when a PR introduces or changes:

- A new capability visible to any external consumer (extension, API client, proxy)
- A change to the gRPC contract (`brunix.proto`)
- A change to the HTTP proxy endpoints or behavior
- A feature requested by product or business stakeholders

### When NOT to write a PRD

- Internal architectural changes (use an ADR instead)
- Bug fixes with no change in user-visible behavior
- Infrastructure or tooling changes

### PRD format

PRDs live in `docs/product/` and follow this naming convention:

```
PRD-XXXX-short-title.md
```

Each PRD must contain:

```markdown
# PRD-XXXX: Title

**Date:** YYYY-MM-DD
**Status:** Proposed | Implemented
**Requested by:** Name / role
**Related ADR:** ADR-XXXX (if applicable)

## Problem
What user or business problem does this solve?

## Solution
What are we building?

## Scope
What is in scope and explicitly out of scope?

## Technical design
Key implementation decisions.

## Validation
How do we know this works? Acceptance criteria.

## Impact on parallel workstreams
Does this affect any ongoing experiment or evaluation?
```

### Existing PRDs

| PRD | Title | Status |
|---|---|---|
| [PRD-0001](docs/product/PRD-0001-openai-compatible-proxy.md) | OpenAI-Compatible HTTP Proxy | Implemented |
| [PRD-0002](docs/product/PRD-0002-editor-context-injection.md) | Editor Context Injection for VS Code Extension | Proposed |

---

## 11. Research & Experiments Policy

All scientific experiments, benchmark results, and dataset evaluations conducted by the research team must be documented and committed to the repository under `research/`.

### Rules

- Every experiment must have a corresponding result file in `research/` before any engineering decision based on that experiment is considered valid.
- Benchmark scripts, evaluation notebooks, and raw results must be committed alongside a summary README that explains the methodology, datasets used, metrics, and conclusions.
- Experiments that inform an ADR must be referenced from that ADR with a direct path to the result files.
- The golden dataset used by `EvaluateRAG` (`Docker/src/golden_dataset.json`) is a production artifact. Any modification requires explicit approval from the CTO and a new baseline EvaluateRAG run before the change is merged.

### Directory structure

```
research/
  embeddings/       ← embedding model benchmarks (BEIR, MTEB)
  experiments/      ← RAG architecture experiments
  datasets/         ← synthetic datasets and golden datasets
```

### Why this matters

An engineering decision based on an experiment that is not reproducible, not committed, or not peer-reviewable has no scientific validity. All decisions with impact on the production system must be traceable to documented, committed evidence.

---

## 12. Incident & Blockage Reporting

If you encounter a technical blockage (connection timeouts, service downtime, tunnel failures):

1. **Immediate notification** — Report via the designated Slack channel at the moment of detection. Do not wait until end of day.
2. **GitHub Issue must include:**
   - The exact command executed
   - Full terminal output (complete error logs)
   - Current status of all `kubectl` tunnels
3. **Resolution** — If the error is not reproducible by the CTO/DevOps team, a 5-minute live debugging session will be scheduled to identify local network or configuration issues.

See [`docs/RUNBOOK.md`](docs/RUNBOOK.md) for full incident playbooks and escalation paths.

---

*These standards exist to protect the integrity of the Brunix Assistance Engine and to ensure every member of the team can work confidently and efficiently. They are not bureaucratic overhead — they are the foundation of a reliable, scalable engineering practice.*

*— Rafael Ruiz, CTO, AVAP Technology*