REX is a small LLM orchestration project with a FastAPI backend, a React/Vite frontend, and an optional Go fanout service for high-throughput parallel model calls.
It’s built to be easy to run locally, while still including practical engineering features:
- Provider abstraction (swap direct calls vs Go fanout)
- Deadlines + cancellation for sub-queries
- Structured JSON logs, Prometheus metrics, and optional OpenTelemetry tracing
- A tiny deterministic evaluation harness you can run in CI
flowchart LR
UI[React/Vite UI] -->|HTTP/WebSocket| API[FastAPI API]
API --> ORCH[Orchestrator + Pipeline]
ORCH -->|Provider abstraction| PROV[Gemini Provider]
PROV -->|Optional| GO[Go fanout-service]
GO --> GEM[Gemini API]
API -->|optional| REDIS[(Redis)]
REDIS -->|pubsub| WS[WebSocket clients]
API --> METRICS[/Prometheus metrics/]
GO --> METRICS
API --> TRACE[(OpenTelemetry spans)]
GO --> TRACE
Where to look in the code:
- Backend entrypoint:
src/main.py - API routes + orchestration pipeline:
src/api/routes.py - Provider layer (Go fanout vs direct):
src/providers/gemini.py - Go fanout-service:
go/fanout-service/main.go
From recursion/:
python -m uvicorn src.main:app --reload --host 0.0.0.0 --port 8000From recursion/frontend/:
npm install
npm run devOpen the UI at http://localhost:5173.
From recursion/go/fanout-service/:
go run .Enable it from the Python side by setting FANOUT_URL:
set FANOUT_URL=http://127.0.0.1:8099On macOS/Linux:
export FANOUT_URL=http://127.0.0.1:8099GEMINI_API_KEY(orGOOGLE_API_KEY)
FANOUT_URL— if set, routes model fanout through the Go serviceSUBQUERY_DEADLINE_MS— per-subquery hard deadline (default 30000)
REX_CACHE_ENABLED— set to0/falseto disable cachingREX_CACHE_TTL_SECONDS— cache TTL (default 86400)
REDIS_URL— enables Redis caching and (if RQ is installed and configured) async jobs
OTEL_ENABLED=1— turns on tracing (best-effort; safe to leave off)OTEL_EXPORTER_OTLP_ENDPOINT— e.g.http://127.0.0.1:4318(otherwise spans print to console)OTEL_SERVICE_NAME— overrides service name (defaults torex-api/ Go defaults)
Both Python and Go emit JSON logs and propagate request IDs.
- Python API metrics:
GET http://127.0.0.1:8000/metrics - Go fanout-service metrics:
GET http://127.0.0.1:8099/metrics
Tracing spans connect API → orchestrator → provider → Go fanout-service.
Trace context is propagated using standard W3C headers (for example, traceparent).
POST /api/run— run a query (sync)POST /api/run-async— enqueue async job if Redis/RQ available (falls back to sync)- WebSocket — pushes
query_started/query_partial/query_completed(andquery_rate_limited)
For multi-tenant readiness, you can pass a lightweight client identity header:
X-Client-Id: your-workspace-name
This identity is used for:
- Per-client in-memory rate limiting (
REX_CLIENT_QPS,REX_CLIENT_BURST, optionalREX_CLIENT_LIMITS_JSON) - Cache key isolation (same prompt + models but different client IDs won’t share cached results)
- Per-client metrics (client IDs are hashed before being used as metric labels)
From recursion/:
python -m pytest -qThe eval harness runs without external network calls (it uses the simulated pipeline), so it’s stable in CI.
python -m pytest -m evalDataset: eval/dataset.jsonl
GitHub Actions runs:
- Python: ruff (bug gate) + mypy (baseline) + pytest
- Go: golangci-lint (govet baseline) + go test
- Frontend: npm build
Workflow: .github/workflows/ci.yml
scripts/fanout_smoke_test.py— quick contract smoke test for fanout-servicescripts/fanout_load_test.py— produces throughput/latency artifacts underresults/