Skip to content

Latest commit

 

History

History
197 lines (140 loc) · 6.7 KB

File metadata and controls

197 lines (140 loc) · 6.7 KB

REX — Recruiter-Ready Task Plan (AI Engineer)

This file is a living checklist to evolve this repo into a recruiter-grade, production-credible LLM orchestration platform.

North Star (what recruiters want to see)

  • Clear architecture: separation of API, orchestration, model providers, storage, eval
  • Reliability: retries/backoff, rate-limit handling, timeouts, circuit breakers
  • Performance: high-throughput fanout + concurrency control (Go service)
  • Observability: structured logs, metrics, tracing, dashboards
  • Reproducible evaluation: regression tests + scorecards + self-improvement loop
  • Secure secrets handling: no keys in repo, safe env/config patterns

P0 — Must-have (to be “ideal for recruiter”)

1) Go performance layer (high-throughput fanout)

Goal: Move the hottest path (parallel LLM calls, rate-limit management, streaming aggregation) into Go.

  • Define the contract between Python orchestrator and Go service

    • Acceptance: a written interface (request/response schema) and a small design doc section in README.

    • Options:

      • HTTP/JSON (fast to ship)
      • gRPC (better performance + typed contracts)
    • Output:

      • Contract doc: recursion/go/fanout-service/CONTRACT.md
      • Smoke test: recursion/scripts/fanout_smoke_test.py
  • Implement Go service: fanout-service

    • Responsibilities:
      • per-provider concurrency limits (semaphores)
      • retry/backoff with jitter
      • deadline propagation
      • rate-limit detection + backoff
      • structured logging + metrics
    • Acceptance:
      • Go service exposes /fanout (or gRPC method) and can run locally.

Current status

  • Go service scaffold created at recursion/go/fanout-service

    • Run locally:
      • cd recursion/go/fanout-service
      • go run .
    • Endpoints:
      • GET /healthz
      • POST /fanout (implemented; returns per-call results)
  • Integrate Python → Go call path

    • Acceptance:
      • A feature flag (env var) chooses Go path vs pure Python path.
      • Trace output shape remains consistent.
  • Add load test for the Go service

    • Acceptance:
      • A repeatable script produces throughput/latency results and saves to results/.
    • Run:
      • python scripts/fanout_load_test.py
    • Output:
      • Writes results/fanout_load_*.json and results/fanout_load_*.txt

2) Provider abstraction (clean architecture)

  • Create provider interface layer (Python)

    • Example: ProviderClient with complete(messages, model, timeout, ...)
    • Acceptance:
      • routes.py no longer contains provider-specific logic.
  • Centralize model naming + routing

    • Acceptance:
      • one module maps google/gemini-* → the correct provider implementation.

3) Reliability & correctness primitives

  • Add hard timeouts and cancellation

    • Acceptance:
      • If a sub-call exceeds deadline, it is cancelled and the trace records the timeout.
  • Add retry policies and error taxonomy

    • Acceptance:
      • transient errors retry; permanent errors don’t; trace contains reason codes.
  • Add caching rules (optional but strong)

    • Acceptance:
      • cache key includes model + prompt + parameters, and can be disabled.
    • Notes:
      • Disable with REX_CACHE_ENABLED=0 (or CACHE_ENABLED=0)
      • TTL via REX_CACHE_TTL_SECONDS (default 86400)

4) Observability (production credibility)

  • Structured JSON logging (Python + Go)

    • Acceptance:
      • each request has a request-id; logs include model, latency, outcome.
  • Metrics (Prometheus)

    • Acceptance:
      • counters/histograms for latency, errors, tokens, fanout sizes.
    • Endpoints:
      • Go fanout-service: GET /metrics (default http://127.0.0.1:8099/metrics)
      • Python API: GET /metrics (default http://127.0.0.1:8000/metrics)
  • Distributed tracing (OpenTelemetry)

    • Acceptance:
      • trace spans connect API → orchestrator → provider calls (and Go service spans).
    • Current:
      • Go fanout-service emits spans when OTEL_ENABLED=1
      • Python API wraps /api/run in a top-level span and can init tracing via OTEL_ENABLED=1
      • Trace context propagates from Python → Go (traceparent injected on fanout request)
      • Provider spans exist in Python (fanout request + per-model LiteLLM calls)
    • Run:
      • Python: set OTEL_ENABLED=1 (optional: OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318)
      • Go: set OTEL_ENABLED=1 (same exporter envs if using OTLP)

5) Reproducible evaluation harness

  • Create eval/ harness

    • Acceptance:
      • can run pytest -m eval (or similar) and produce a score report.
    • Current:
      • Deterministic smoke dataset at eval/dataset.jsonl
      • Test runner at tests/test_eval_harness.py
  • Regression suite from traces

    • Acceptance:
      • store a small curated dataset of prompts + expected properties.
    • Current:
      • Snapshot: eval/regression_traces.jsonl (generated from deterministic simulated traces)
      • Test: tests/test_regression_traces.py (runs under -m eval)
      • Generator: scripts/generate_regression_traces.py
  • Self-improvement loop should be test-gated

    • Acceptance:
      • improvement changes must improve score or be rejected.
    • Current:
      • CI runs eval tests on every change.
      • Local gate script: scripts/run_improvement_gated.py (eval-before/after wrapper)

P1 — Strong differentiators (after P0)

6) Streaming + incremental synthesis

  • Add streaming responses (server-sent events or websockets) for partial results
  • Incrementally synthesize as responses arrive (not only after all complete)

7) Multi-tenant readiness (even if local)

  • Configurable quotas/rate limits per “workspace” (not auth—just API client identity)
  • Isolation in cache keys / metrics labels

8) CI/CD + quality gates

  • GitHub Actions pipeline

    • Acceptance:
      • runs tests, builds frontend, builds Go service
  • Lint + type checks

    • Current:
      • Ruff bug-gate in CI (fails on syntax/undefined-name class issues)
      • Mypy baseline gate in CI (lenient config; tighten over time)
      • golangci-lint in CI (govet baseline)

P2 — Nice-to-have (only if time)

  • Docker Compose for full stack (backend + go service + redis + frontend)
  • Benchmark suite with saved baseline comparisons
  • Canary mode for new pipeline versions

Recruiter-facing deliverables (what to show)

  • Architecture diagram in README (components + data flow)
  • “How it scales” section (concurrency control, backpressure, rate-limits)
  • Performance report (before/after Go service) in results/
  • Reliability report: error rates + retry behavior + timeout behavior
  • Evaluation report: scorecards + regression history