REX — Recruiter-Ready Task Plan (AI Engineer)

This file is a living checklist to evolve this repo into a recruiter-grade, production-credible LLM orchestration platform.

North Star (what recruiters want to see)

Clear architecture: separation of API, orchestration, model providers, storage, eval
Reliability: retries/backoff, rate-limit handling, timeouts, circuit breakers
Performance: high-throughput fanout + concurrency control (Go service)
Observability: structured logs, metrics, tracing, dashboards
Reproducible evaluation: regression tests + scorecards + self-improvement loop
Secure secrets handling: no keys in repo, safe env/config patterns

P0 — Must-have (to be “ideal for recruiter”)

1) Go performance layer (high-throughput fanout)

Goal: Move the hottest path (parallel LLM calls, rate-limit management, streaming aggregation) into Go.

Define the contract between Python orchestrator and Go service
- Acceptance: a written interface (request/response schema) and a small design doc section in README.
- Options:
  - HTTP/JSON (fast to ship)
  - gRPC (better performance + typed contracts)
- Output:
  - Contract doc: recursion/go/fanout-service/CONTRACT.md
  - Smoke test: recursion/scripts/fanout_smoke_test.py
Implement Go service: fanout-service
- Responsibilities:
  - per-provider concurrency limits (semaphores)
  - retry/backoff with jitter
  - deadline propagation
  - rate-limit detection + backoff
  - structured logging + metrics
- Acceptance:
  - Go service exposes /fanout (or gRPC method) and can run locally.

Current status

Go service scaffold created at recursion/go/fanout-service
- Run locally:
  - cd recursion/go/fanout-service
  - go run .
- Endpoints:
  - GET /healthz
  - POST /fanout (implemented; returns per-call results)
Integrate Python → Go call path
- Acceptance:
  - A feature flag (env var) chooses Go path vs pure Python path.
  - Trace output shape remains consistent.
Add load test for the Go service
- Acceptance:
  - A repeatable script produces throughput/latency results and saves to results/.
- Run:
  - python scripts/fanout_load_test.py
- Output:
  - Writes results/fanout_load_*.json and results/fanout_load_*.txt

2) Provider abstraction (clean architecture)

Create provider interface layer (Python)
- Example: ProviderClient with complete(messages, model, timeout, ...)
- Acceptance:
  - routes.py no longer contains provider-specific logic.
Centralize model naming + routing
- Acceptance:
  - one module maps google/gemini-* → the correct provider implementation.

3) Reliability & correctness primitives

Add hard timeouts and cancellation
- Acceptance:
  - If a sub-call exceeds deadline, it is cancelled and the trace records the timeout.
Add retry policies and error taxonomy
- Acceptance:
  - transient errors retry; permanent errors don’t; trace contains reason codes.
Add caching rules (optional but strong)
- Acceptance:
  - cache key includes model + prompt + parameters, and can be disabled.
- Notes:
  - Disable with REX_CACHE_ENABLED=0 (or CACHE_ENABLED=0)
  - TTL via REX_CACHE_TTL_SECONDS (default 86400)

4) Observability (production credibility)

Structured JSON logging (Python + Go)
- Acceptance:
  - each request has a request-id; logs include model, latency, outcome.
Metrics (Prometheus)
- Acceptance:
  - counters/histograms for latency, errors, tokens, fanout sizes.
- Endpoints:
  - Go fanout-service: GET /metrics (default http://127.0.0.1:8099/metrics)
  - Python API: GET /metrics (default http://127.0.0.1:8000/metrics)
Distributed tracing (OpenTelemetry)
- Acceptance:
  - trace spans connect API → orchestrator → provider calls (and Go service spans).
- Current:
  - Go fanout-service emits spans when OTEL_ENABLED=1
  - Python API wraps /api/run in a top-level span and can init tracing via OTEL_ENABLED=1
  - Trace context propagates from Python → Go (traceparent injected on fanout request)
  - Provider spans exist in Python (fanout request + per-model LiteLLM calls)
- Run:
  - Python: set OTEL_ENABLED=1 (optional: OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318)
  - Go: set OTEL_ENABLED=1 (same exporter envs if using OTLP)

5) Reproducible evaluation harness

Create eval/ harness
- Acceptance:
  - can run pytest -m eval (or similar) and produce a score report.
- Current:
  - Deterministic smoke dataset at eval/dataset.jsonl
  - Test runner at tests/test_eval_harness.py
Regression suite from traces
- Acceptance:
  - store a small curated dataset of prompts + expected properties.
- Current:
  - Snapshot: eval/regression_traces.jsonl (generated from deterministic simulated traces)
  - Test: tests/test_regression_traces.py (runs under -m eval)
  - Generator: scripts/generate_regression_traces.py
Self-improvement loop should be test-gated
- Acceptance:
  - improvement changes must improve score or be rejected.
- Current:
  - CI runs eval tests on every change.
  - Local gate script: scripts/run_improvement_gated.py (eval-before/after wrapper)

P1 — Strong differentiators (after P0)

6) Streaming + incremental synthesis

Add streaming responses (server-sent events or websockets) for partial results
Incrementally synthesize as responses arrive (not only after all complete)

7) Multi-tenant readiness (even if local)

Configurable quotas/rate limits per “workspace” (not auth—just API client identity)
Isolation in cache keys / metrics labels

8) CI/CD + quality gates

GitHub Actions pipeline
- Acceptance:
  - runs tests, builds frontend, builds Go service
Lint + type checks
- Current:
  - Ruff bug-gate in CI (fails on syntax/undefined-name class issues)
  - Mypy baseline gate in CI (lenient config; tighten over time)
  - golangci-lint in CI (govet baseline)

P2 — Nice-to-have (only if time)

Docker Compose for full stack (backend + go service + redis + frontend)
Benchmark suite with saved baseline comparisons
Canary mode for new pipeline versions

Recruiter-facing deliverables (what to show)

Architecture diagram in README (components + data flow)
“How it scales” section (concurrency control, backpressure, rate-limits)
Performance report (before/after Go service) in results/
Reliability report: error rates + retry behavior + timeout behavior
Evaluation report: scorecards + regression history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REX — Recruiter-Ready Task Plan (AI Engineer)

North Star (what recruiters want to see)

P0 — Must-have (to be “ideal for recruiter”)

1) Go performance layer (high-throughput fanout)

Current status

2) Provider abstraction (clean architecture)

3) Reliability & correctness primitives

4) Observability (production credibility)

5) Reproducible evaluation harness

P1 — Strong differentiators (after P0)

6) Streaming + incremental synthesis

7) Multi-tenant readiness (even if local)

8) CI/CD + quality gates

P2 — Nice-to-have (only if time)

Recruiter-facing deliverables (what to show)

FilesExpand file tree

task.md

Latest commit

History

task.md

File metadata and controls

REX — Recruiter-Ready Task Plan (AI Engineer)

North Star (what recruiters want to see)

P0 — Must-have (to be “ideal for recruiter”)

1) Go performance layer (high-throughput fanout)

Current status

2) Provider abstraction (clean architecture)

3) Reliability & correctness primitives

4) Observability (production credibility)

5) Reproducible evaluation harness

P1 — Strong differentiators (after P0)

6) Streaming + incremental synthesis

7) Multi-tenant readiness (even if local)

8) CI/CD + quality gates

P2 — Nice-to-have (only if time)

Recruiter-facing deliverables (what to show)