Skip to content

drdgreed/pacca

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

PACCA — Prior Authorization & Care Coordination Agent Platform

Multi-agent prior authorization with observability-driven harness engineering

FeaturesArchitectureHarness EngineeringQuick StartAPI DocsDemo

CI codecov Python 3.12+ FastAPI React 18 Claude API Synthetic data Code style: ruff MIT License


Portfolio disclaimer

PACCA is a pre-production project. It is not HIPAA-validated, has no Business Associate Agreements in place with any subcontractor, and must not be used with real Protected Health Information. Every clinical case in this repository is synthetic (see tests/clinical/*_cases.py). The pre-commit PHI guard (.githooks/pacca_guard.py) actively blocks PHI-shaped strings from being committed.

The engineering practices shown here — multi-agent orchestration, RAG over clinical guidelines, audit-grade observability, the harness-engineering discipline, the SME case-authoring workflow — are production-grade. The deployment is not. Treat the repo as a reference architecture, not a turnkey product.

What would close the gap to actual HIPAA compliance is documented in docs/PACCA_PRD_v2.4_Consolidated.md § 16 (SaMD-grade validation) and docs/HIPAA_COMPLIANCE.md. The short version: signed BAAs with every subcontractor (AWS, Anthropic, etc.), encryption-at-rest column-level, role-based access controls, breach-notification procedures, named Privacy + Security Officers, and an ongoing risk-assessment program. The code is one part of a much larger compliance posture.


Overview

PACCA is a secure, multi-agent AI workflow that automates healthcare prior authorization reviews. It solves one of healthcare's most expensive bottlenecks ($50–100B annually in U.S. administrative overhead) by combining the reasoning capabilities of Large Language Models with strict deterministic grading rubrics, dual-collection vector retrieval, and a HIPAA-conscious audit infrastructure.

Unlike basic "LLM-wrapper" approaches, PACCA grounds every decision in factual medical guidelines via Retrieval-Augmented Generation, escalates to specialist tiers using a 7-branch deterministic decision tree, and applies observability-driven harness engineering to iterate the system itself.

Current state. PRD v2.4 is the active spec (introduces §16 Clinical Validation Strategy). The DecisionSupportAgent prompt registry is at v2.5 after iter-5's institutional-memory bump. Harness iter-6 is the open iteration.

A methodology, not just features. The harness discipline — introduced in v2.3 and active through every iteration since — requires every behavioral change to PACCA's agent harness to ship as a one-file diff with a falsifiable predicted-impact contract that the next evaluation round verifies. The methodology is adapted from Lin et al., Agentic Harness Engineering (arXiv:2604.25850, 2026). The repository's docs/ folder makes the discipline auditable from outside.

Governance context. PACCA is a Class 2/3 enterprise agent operating inside a CRISP-AG-style governance envelope. CRISP-AG is an artifact-centered framework for enterprise agentic AI governance that sits beneath ISO/IEC 42001 and NIST AI RMF — the standards establish what governance must achieve; CRISP-AG specifies what the producible artifacts look like. The harness-engineering discipline documented in this repo is a concrete instance of CRISP-AG's Orchestration Contract artifact; the seven-branch escalation tree and Medical Director gate instantiate the Delegation Authority Scoping artifact applied to a healthcare domain. See drdavidreed.com/portfolio for the full white paper.

The Problem

Prior authorization is one of healthcare's most measurable failures:

  • Providers spend 34+ hours/week per practice on prior authorization workflows
  • Patients face treatment delays averaging 2–3 days, with 29% of delays directly harming care
  • Payers process 200+ million requests annually, mostly manually
  • Reviewers use outdated guideline versions in 35% of cases, with decision quality varying 18–35% by individual

The Solution

PACCA automates the workflow using a five-agent hierarchical architecture with deterministic safety controls:

  1. Evidence Aggregation — synthesizes scattered clinical data into coherent narratives
  2. Clinical Classification — complexity scoring, specialty routing, urgency assessment
  3. Decision Support (Tier 1) — guideline-based recommendations with chain-of-thought reasoning
  4. Medical Director (Tier 2) — invoked for ambiguous cases (confidence 0.90–0.95)
  5. Policy Evolution (Governance) — proposes amendments based on human-override patterns; deploys only with Medical Director approval

Eight production-grade safety properties:

  • JWT-authenticated provider dashboard with bcrypt password hashing
  • Dual-collection ChromaDB: official guidelines vs. institutional-memory precedents
  • Chain-of-thought reasoning with anti-hallucination, uncertainty-flagging, and escalation-trigger guards on every agent
  • 7-branch escalation tree (4 pre-flight + 3 post-agent) — deterministic safety logic that overrides AI confidence on experimental treatments, rare conditions, conflicting guidelines, and prior denials
  • Pre-write HIPAA audit trail with correlation-ID linked event pairs
  • OpenTelemetry → Langfuse distributed tracing on every agent call
  • Runtime-adjustable operational parameters (confidence thresholds, retry budget, autonomy switch) without server restart
  • Three-stage governance pipeline for AI-proposed guideline amendments — meets FDA SaMD change-control intent

Architecture

PACCA system architecture: React frontend, JWT bouncer, FastAPI backend, multi-agent orchestrator with 7-branch escalation tree, dual-collection ChromaDB RAG, and Claude API integration

Mermaid source (click to expand)
graph TD
    classDef frontend fill:#61dafb,stroke:#333,stroke-width:2px,color:#000
    classDef backend fill:#009688,stroke:#333,stroke-width:2px,color:#fff
    classDef auth fill:#e91e63,stroke:#333,stroke-width:2px,color:#fff
    classDef agent fill:#ff9800,stroke:#333,stroke-width:2px,color:#fff
    classDef database fill:#607d8b,stroke:#333,stroke-width:2px,color:#fff

    React[React Frontend SPA]:::frontend
    Auth[JWT Auth Bouncer]:::auth
    API[FastAPI Backend]:::backend

    SQL[(PostgreSQL 16<br/>User Credentials & Audit)]:::database
    Chroma[(ChromaDB Dual-Collection<br/>nccn_guidelines + case_precedents)]:::database

    Orchestrator{Multi-Agent<br/>Orchestrator + 7-Branch<br/>Escalation Tree}:::agent
    Agent1[Tier 1: Frontline Nurse Agent]:::agent
    Agent2[Tier 2: Medical Director Agent]:::agent
    LLM((Claude API<br/>Sonnet 4)):::database

    React -- "POST /login" --> Auth
    Auth -- "Verify/Hash" --> SQL
    Auth -- "Returns JWT" --> React

    React -- "Submit Case + JWT" --> API
    API -- "Pre-flight checks" --> Orchestrator
    Orchestrator -- "Semantic Search" --> Chroma
    Chroma -- "Guidelines + Precedents" --> Orchestrator

    Orchestrator -- "Tier 1 Review" --> Agent1
    Agent1 -- "Evaluate" --> LLM
    Agent1 -- "Confidence 0.90-0.95" --> Orchestrator
    Orchestrator -- "Tier 2 Escalation" --> Agent2
    Agent2 -- "Evaluate Nuance" --> LLM

    Agent1 -. "Auto-Approve (>=0.95)" .-> API
    Agent2 -. "Approve / In Review" .-> API
    API -- "JSON Decision + Audit Trail" --> React
Loading

For the complete architecture, see docs/ARCHITECTURE.md. For the harness layer specifically, see docs/HARNESS.md.


Process Flow

Three concurrent workflows feed the same data store. Each generates audit records under a shared correlation_id so the full trace is queryable by one ID.

Workflow A: Provider submits → agent decides

sequenceDiagram
    participant P as Provider
    participant API as FastAPI
    participant O as Orchestrator
    participant RAG as ChromaDB
    participant L as Claude API
    participant A as Audit Log

    P->>API: POST /authorizations/ (case + JWT)
    API->>A: log("authorization_submitted", actor=NPI)
    API->>RAG: query(case → guidelines + precedents)
    RAG-->>API: retrieved chunks
    API->>O: process_decision(ctx)
    O->>L: Decision Agent prompt
    L-->>O: confidence + rationale
    alt confidence ≥ 0.95
        O-->>API: AUTO_APPROVED
    else 0.90 ≤ confidence < 0.95
        O->>L: Medical Director prompt
        L-->>O: nuanced decision
        O-->>API: IN_REVIEW or APPROVED
    else
        O-->>API: IN_REVIEW (Director queue)
    end
    API->>A: log("decision_made", outcome, latency_ms)
    API-->>P: JSON decision + rationale
Loading

Workflow B: Medical Director reviews escalation → teaches the system

sequenceDiagram
    participant D as Director
    participant API as FastAPI
    participant V as Precedents
    participant A as AuditLog

    D->>API: GET /director-queue
    D->>D: Reviews case and agent rationale
    alt Director overrides AI
        D->>API: POST /authorizations/feedback
        API->>V: embed case and rationale into precedents
        API->>A: log human_override actor=director_id
    else Director confirms agent
        D->>API: no-op, AI decision stands
        API->>A: log director_confirmed actor=director_id
    end
    Note over V: Future semantically-similar cases retrieve this precedent alongside guidelines
Loading

Workflow C: SME authors a new clinical test case

sequenceDiagram
    participant S as SME
    participant W as WebUI
    participant API as FastAPI
    participant Ag as AuthoringAgent
    participant V as Validators
    participant FS as CaseFiles
    participant T as IntegrityTests

    S->>W: Plain-English scenario plus mode sandbox or prod
    W->>API: POST /sessions
    API->>Ag: allocate next GC-NNN and draft via LLM
    Ag-->>W: typewriter stream over WebSocket
    S->>W: Edit fields if needed
    W->>API: POST /sessions/id/validate
    API->>V: run 6 deterministic checks
    V-->>API: pass or warn or fail per validator
    alt any FAIL
        API-->>W: blocked, SME revises
    else all PASS
        S->>W: Type attestation
        W->>API: POST /sessions/id/commit
        API->>FS: emit GoldenCase Python via AST
        API->>T: pytest TestGoldenDatasetIntegrity
        alt integrity FAIL
            T-->>API: rollback file mutation
            API-->>W: error surfaced to SME
        else integrity PASS
            API-->>W: PR template ready to copy
            S->>S: gh pr create with pasted body
        end
    end
Loading

Results

Numbers are measured locally (the unit and integration suites) or clearly labeled as benchmark/simulated where they reflect synthesized cases rather than production traffic. The repository ships with no real PHI, so all clinical numbers come from the 53-case synthesized demo dataset and the 20-case clinical golden set.

Metric Value Source
Unit tests 120 / 120 passing pytest tests/unit — 7.14s
Total tests across tiers 146 (unit + integration + clinical) pytest tests/ --collect-only
Clinical-accuracy CI gate ≥80% pass rate on 20-case golden set, LLM-as-judge (Claude Haiku, 1–5 rubric) tests/clinical/, fails the build below threshold
Hallucination tolerance Zero — sparse-notes traps GC-018, GC-019 fail the build on any score-1 hallucination tests/unit/test_clinical_accuracy.py
Lint posture ruff check src/ tests/ — clean CI lint job
Median decision latency (benchmark, single-process) ~2.1 s Synthesized 53-case run, Sonnet 4
95p decision latency (benchmark, single-process) ~4.3 s Same
Auto-approval rate (synthesized dataset) 28% (15 / 53 cases) Group A — complete documentation, explicit guideline alignment
Human-review rate (synthesized dataset) 19% (10 / 53 cases) Group B — missing documentation, hallucination traps
Pre-flight escalations triggered (synthesized dataset) 32% (17 / 53 cases) Groups D–G — experimental treatment, rare condition, conflicting guidelines, prior denial
Cost per decision (simulated, Sonnet 4 at current pricing) ~$0.04 Token-counted per case; pricing as of 2026-05
Harness iterations recorded 2 (harness-iter-0 baseline, harness-iter-1 first extraction) harness/manifests/iter-{0,1}.json
Methodology source Lin et al., Agentic Harness Engineering arXiv:2604.25850

What is not measured yet: sustained-load latency, aggregate cost-per-decision at production volume, and adversarial prompt-injection resistance. These land in Phase H5 (Evaluation Harness Expansion). See docs/EVALUATION.md for the methodology and the gap list.


Harness Engineering

Beginning with v2.3, PACCA is iterated using a structured, falsifiable methodology. Every behavioral change is a one-file diff with a recorded prediction. The next evaluation round verifies the prediction. Rejected changes are reverted at file granularity.

PACCA harness iteration cycle: observe failure pattern in trajectory logs, write change_manifest with predicted impact and rollback plan, ship a one-file diff at a constrained surface, CI validates schema and tests, tag the iteration and run the eval suite, then ratify or revert at file granularity

Mermaid source for the iteration cycle (click to expand)
flowchart TD
    classDef observe fill:#9c27b0,stroke:#444,stroke-width:2px,color:#fff
    classDef discipline fill:#ff9800,stroke:#444,stroke-width:2px,color:#fff
    classDef ship fill:#009688,stroke:#444,stroke-width:2px,color:#fff
    classDef verdict fill:#e91e63,stroke:#444,stroke-width:2px,color:#fff
    classDef gate fill:#607d8b,stroke:#444,stroke-width:2px,color:#fff

    A[Observe failure pattern<br/>in trajectory logs]:::observe
    B[Write change_manifest<br/>predicted impact + rollback]:::discipline
    C[One-file diff<br/>at constrained surface]:::discipline
    D[CI validates manifest schema<br/>+ unit tests + clinical-accuracy gate]:::ship
    E{Merge?}:::gate
    F[Tag harness-iter-N<br/>+ run eval suite, 100+ cases, k=2]:::ship
    H{Predicted impact<br/>verified?}:::gate
    I[Ratify<br/>append verdict to DECISIONS.md]:::verdict
    J[Revert<br/>file-granularity rollback]:::verdict

    A --> B --> C --> D --> E
    E -- yes --> F --> H
    H -- yes --> I
    H -- no --> J
    I -. observe again .-> A
    J -.-> A
Loading

The methodology adapts the AHE paper's three observability pillars to a healthcare domain:

Pillar PACCA Implementation
Component observability 11 editable harness surfaces (7 NexAU-standard + 4 PACCA-specific), each at a fixed file path with one-file-diff rollback
Experience observability OpenTelemetry spans → Langfuse + structured trajectory logs alongside the HIPAA audit trail
Decision observability Every change ships with a change_manifest entry; verdicts logged in DECISIONS.md

The Four Harness Engineering Documents

Document Purpose
📐 docs/HARNESS.md Architectural reference. The seven AHE component types plus PACCA's four healthcare-specific harness surfaces, with rules for editing each.
📋 docs/DECISIONS.md Append-only log of every behavioral change with predictions and verified outcomes. The audit trail of the iteration cycle itself.
📖 docs/ITERATIONS.md Narrative log per iteration tag. Format borrowed from the AHE paper's Appendix C — failure pattern → change → trajectory before/after → eval delta.
🔒 harness/manifests/change_manifest.schema.json JSON Schema 2020-12 specification for change manifests. Includes PACCA-specific fields (phi_impact, audit_relevant) tying the discipline to healthcare governance requirements.

v2.4 Cycle Phases

The v2.4 release commits PACCA to a six-phase cycle over 10–12 weeks. Each phase has explicit exit criteria verifiable from git history and the evaluation suite:

Phase Name Weeks Constraint Levels
H0 Baseline Crystallization 1–2 Instrumentation only
H1 Component Decoupling 3–4 system_prompt, tool_description, tool_implementation
H2 Institutional Memory Layer 5–6 long_term_memory
H3 Cross-Step Middleware Tier 7–8 middleware
H4 Change Manifest Discipline 3–10 (parallel) Process layer
H5 Evaluation Harness Expansion 10–12 Eval infrastructure

Full phase specifications, exit criteria, expected impact, and AHE paper citations are in the consolidated PRD §15.


Features

📋 Clinical Decision Support

  • RAG-powered guideline retrieval using ChromaDB dual-collection
  • Evidence-based recommendations with confidence scores
  • Transparent decision rationale, audit-logged
  • Step therapy and prior treatment requirement support

👥 Human Oversight

  • Configurable confidence thresholds for autonomous decisions
  • 7-branch escalation tree with 4 pre-flight deterministic checks (experimental treatment, rare condition, conflicting guidelines, prior denial)
  • Medical Director review interface with AI-generated case summaries
  • Complete audit trail for regulatory compliance

📚 RAG and Institutional Memory

  • nccn_guidelines — authoritative clinical guidelines (NCCN, CMS, AHA, ADA, ACR), quarterly updates, independent versioning and rollback
  • case_precedents — Medical Director override decisions with documented rationales, embedded immediately, surfaced in semantically similar future cases
  • v2.4+ adds per-agent long_term_memory.md files: human-readable, git-versioned cross-cutting clinical lessons that ride in the prompt context on every request (Phase H2)

🛡️ Production-Grade Safety

  • Anti-hallucination guards on every agent ("only reference clinical evidence explicitly present in the submission")
  • Hallucination zero-tolerance tests (GC-018, GC-019) — sparse-notes traps that fail the build on any score-1 hallucination
  • Tool-use API forced for structured output — eliminates the most common agentic failure mode
  • Pre-write audit trail — correlation-ID-linked event pairs flushed before any state change
  • JWT + bcrypt + fail-fast SECRET_KEY validation — server refuses to start with weak or missing keys
  • Append-only PolicyChangeLogEntry — immutable record of every guideline amendment, mapped to FDA SaMD Action Plan change-control requirements

🔧 Production-Ready Architecture

  • FastAPI backend with full async support
  • React 18 frontend with real-time updates
  • PostgreSQL 16 for persistence, SQLite for development (one env-var switch)
  • Dual-collection ChromaDB with metadata filtering
  • OpenTelemetry → Langfuse distributed tracing (Docker Compose included)
  • Comprehensive test coverage: 549+ unit tests (Python) + Playwright smoke tests (frontend)

✍️ SME Case Authoring Agent (2026-Q2)

The clinical-evaluation dataset must grow from 33 cases → 100 (production-pilot) → 300 (general-payer) → 500+ (SaMD-grade). Authoring each case used to take an engineer 60–90 minutes per case to translate clinical knowledge into Python, wire it into the test aggregator, update companion docs, and verify integrity tests. The SME Case Authoring Agent removes the engineer middleware entirely.

A clinician runs one command (CLI) or opens the browser to /sme-author (Web UI), describes a clinical scenario in plain English, reviews the agent's draft case, attests their professional review, and the agent handles everything else:

  • Allocates a monotonic GC-NNN case ID (file-locked across concurrent SMEs)
  • Runs six deterministic validators (PHI scan, guideline citation, schema completeness, outcome ↔ branch consistency, reasoning specificity, judge criteria specificity) — failures block the write
  • Routes to the correct thematic case file
  • Emits valid Python via AST manipulation (parses + idempotent)
  • Updates docs/CASE_PROVENANCE.md with one row including the SME attestation
  • Bumps docs/EVALUATION_COVERAGE.md cells
  • Runs pytest TestGoldenDatasetIntegrity and rolls back the file mutation on any failure
  • Generates a PR template with the SME attestation embedded

Two surfaces, one library. The CLI (pacca sme-author new) and the Web UI (/sme-author/new — 6-step wizard with WebSocket live-drafting) call the same underlying src/pacca/agents/sme_authoring/ Python modules. SMEs pick the interface they prefer; the audit trail is identical.

Architecture details: docs/SME_CASE_AGENT_DESIGN.md (engineering). Clinician walkthrough: docs/SME_CASE_AGENT_USER_MANUAL.md (Section 11 = Web UI, Sections 1–10 = CLI).

🎨 Editorial-Clinical Design System (2026-Q2)

Every PACCA surface (Login, Provider, Director Queue, Admin, SME Authoring) uses a single visual identity:

  • Typography: Source Serif 4 body, Spectral display, JetBrains Mono technical (case IDs, codes, timestamps)
  • Palette: warm cream paper (#faf8f3), ink text, navy emphasis, forest-green approve, oxblood deny, mustard review
  • Status color is ink, not filled badges<StatusInk outcome="approved"> is a colored text span, never a pill
  • Hairline rules + small-caps section labels for editorial rhythm
  • Restrained motion — 200ms fade + 4px translate-up on page-enter, no bounce
  • CSS bundle: ~3.5 KB gzipped (Tailwind retained for layout utilities only; colors + typography owned by frontend/src/styles/theme.css)

The single global stylesheet is 15 files / ~400 LOC and powers every surface. See docs/SME_WEB_UI_DEPLOYMENT.md for the production deployment topology + CSP allowlist + nginx config.


Quick Start

Prerequisites

  • Python 3.12+
  • Node.js 18+ (for frontend)
  • Docker & Docker Compose (recommended)
  • Anthropic API key

Option 1: Docker (Recommended)

# Clone the repository
git clone https://github.com/drdgreed/pacca.git
cd pacca

# Set up environment
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY

# Start all services (FastAPI, frontend, ChromaDB, PostgreSQL, Langfuse)
docker-compose up -d

# Access the application
# Frontend:    http://localhost:3000
# API:         http://localhost:8000
# API Docs:    http://localhost:8000/docs
# Langfuse:    http://localhost:3001

Option 2: Local Development

# Clone and set up
git clone https://github.com/drdgreed/pacca.git
cd pacca

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # or `.venv\Scripts\activate` on Windows

# Install Python + Node dependencies
pip install -e ".[dev]"
cd frontend && npm install && cd ..

# Set environment variables (your Anthropic key is required for agent calls)
export ANTHROPIC_API_KEY=sk-ant-your-key-here
export DATABASE_URL=sqlite+aiosqlite:///./pacca.db
export SECRET_KEY=$(python -c 'import secrets; print(secrets.token_urlsafe(48))')
export CORS_ORIGINS=http://localhost:3000

# Boot both servers (backend on :8000, frontend on :3000)
make sme-author-web

Browse http://localhost:3000/login. Register your first user via /admin after sign-in, or by hitting POST /api/v1/register/ directly. There is no default admin account by design — the portfolio context doesn't ship shared credentials.

Authoring clinical cases as an SME

# CLI workflow (for engineers + power users)
make sme-author          # interactive new-case session
make sme-author-status   # dataset state + milestone gaps
make sme-author-help     # CLI subcommand reference

# Web UI workflow (clinician-friendly)
make sme-author-web      # boots both servers, then browse /sme-author/new

# Playwright smoke tests (one-time browser install required)
cd frontend && npm run test:e2e:install
make sme-author-web-e2e

Running Tests

# Full unit test suite (120 tests, ~7 seconds)
pytest

# With coverage report
pytest --cov=pacca --cov-report=html

# Test categories
pytest tests/test_clinical_accuracy.py        # Clinical reasoning + LLM-as-judge
pytest tests/test_escalation_tree.py          # All 7 escalation branches
pytest tests/test_security_and_scalability.py # Auth, async, RAG

# v2.4+: harness benchmark suite (Phase H5 deliverable)
pytest tests/eval/                             # 100+ case benchmark with k=2 rollouts

Validating Change Manifests (v2.4+)

# Validate a manifest against the schema before committing
python -m pacca.harness.validate_manifest harness/manifests/iter-1.json

API Documentation

Submit Authorization Request

POST /api/v1/authorizations/
Authorization: Bearer <jwt-token>
Content-Type: application/json

{
  "patient": {
    "id": "P12345",
    "date_of_birth": "1966-05-15",
    "gender": "M"
  },
  "diagnosis": {
    "code": "C34.1",
    "description": "Malignant neoplasm of upper lobe, bronchus or lung"
  },
  "treatment": {
    "code": "J9271",
    "code_type": "HCPCS",
    "description": "Pembrolizumab injection",
    "category": "medication",
    "estimated_cost": 15000.00
  },
  "provider": {
    "provider_id": "1234567890",
    "provider_name": "Dr. Jane Smith"
  },
  "payer": {
    "payer_id": "BCBS001",
    "payer_name": "Blue Cross Blue Shield",
    "member_id": "MEM123456"
  },
  "clinical_notes": "Patient with stage IIIA NSCLC, PD-L1 TPS ≥50%...",
  "urgency": "expedited"
}

Response

{
  "request_id": "AUTH-01HQXYZ...",
  "status": "approved",
  "decision": "approve",
  "confidence_score": 0.92,
  "decision_summary": "Authorization approved based on NCCN guidelines...",
  "complexity": 3,
  "specialty": "oncology",
  "requires_human_review": false,
  "harness_iteration_tag": "harness-iter-0",
  "prompt_registry_versions": {
    "decision_support": "1.4.0",
    "medical_director": "1.2.0"
  }
}

Endpoints

Method Endpoint Description
POST /api/v1/register/ Create a new user account
POST /api/v1/login/ Exchange credentials for JWT
POST /api/v1/authorizations/ Submit authorization request
POST /api/v1/authorizations/feedback Medical Director override → vector-store precedent
GET /api/v1/admin/config Read operational configuration
PATCH /api/v1/admin/config Update config at runtime
GET /api/v1/admin/proposals Pending policy proposals
POST /api/v1/admin/proposals/{id}/approve Approve and deploy guideline amendment
GET /api/v1/admin/change-log Immutable policy change audit log
GET /api/v1/admin/harness/iterations List harness iteration tags
SME Authoring Agent (2026-Q2)
GET /api/v1/sme-authoring/status Dataset state: total cases, per-file counts, milestone gaps
GET /api/v1/sme-authoring/batches List planned authoring batches from the roadmap
GET /api/v1/sme-authoring/batches/{id} One batch's case-slot manifest
GET /api/v1/sme-authoring/gaps Prioritized coverage gaps
GET / POST /api/v1/sme-authoring/sessions List or create an authoring session
GET / DELETE /api/v1/sme-authoring/sessions/{id} Inspect or remove a session
POST /api/v1/sme-authoring/sessions/{id}/draft Generate LLM draft (buffered REST)
POST /api/v1/sme-authoring/sessions/{id}/validate Run six deterministic validators
POST /api/v1/sme-authoring/sessions/{id}/commit Commit with SME attestation
WebSocket /api/v1/sme-authoring/sessions/{id}/draft-stream Live token streaming with first-message JWT auth
GET /health Health check

Full API documentation at /docs when running the server (Swagger UI).


Demo Scenarios

PACCA includes 53 synthesized cases across 8 groups (A–H) covering all 7 escalation branches:

Group Cases Scenario
A 15 Auto-approved — complete documentation, explicit guideline alignment
B 10 Human review — missing documentation, hallucination traps
C 8 MD escalation — cost > $100K or borderline confidence
D 5 Experimental treatment pre-flight — CAR-T, gene therapy
E 4 Rare condition pre-flight — Gaucher, Huntington, ALS, Wilson disease
F 4 Conflicting guidelines pre-flight — NCCN vs. CMS vs. payer LCD
G 4 Prior denial pre-flight — resubmissions, fraud patterns
H 3 Precedent-based approvals — institutional memory in action

Plus a 20-case clinical golden dataset with LLM-as-judge scoring (Claude Haiku, 1–5 rubric) and a CI gate at ≥80% accuracy. Hallucinations score automatic 1 — there is no acceptable rate of inventing clinical data.

Phase H5 (delivered in v2.4) unifies these case sources into a single benchmark of 100+ cases with k=2 rollouts per case and pass@1 / tokens-per-case / Succ/Mtok metrics.


Configuration

Environment Variables

Variable Description Default Production
ANTHROPIC_API_KEY Claude API key Required Required + BAA
SECRET_KEY JWT signing key (≥32 chars) Required Rotate quarterly
DATABASE_URL Database connection SQLite PostgreSQL 16
TOKEN_EXPIRE_MINUTES JWT expiry 30 15–30
AUTO_APPROVE_CONFIDENCE_THRESHOLD Auto-approve threshold 0.95 0.95–0.98
ESCALATION_CONFIDENCE_THRESHOLD MD escalation threshold 0.90 0.90–0.95
HIGH_COST_THRESHOLD Cost escalation trigger (USD) 100000 Per payer contract
LLM_RETRY_MAX_ATTEMPTS Max LLM retry attempts 3 3–5
ENABLE_AUTONOMOUS_DECISIONS Master autonomy switch true true (false for audit)
HARNESS_ITERATION_TAG Active harness iteration (v2.4+) harness-iter-0 Latest tagged iteration

See .env.example for all configuration options.


Project Structure

pacca/
├── src/pacca/
│   ├── agents/              # Multi-agent framework
│   │   ├── decision_support/    # v2.3: per-agent component decoupling (Phase H1)
│   │   │   ├── system_prompt.md     # System prompt as standalone file
│   │   │   ├── long_term_memory.md  # v2.3: institutional memory (Phase H2)
│   │   │   ├── tool_descriptions/   # YAML schemas for tool interfaces
│   │   │   ├── tools/               # Tool implementations
│   │   │   ├── middleware/          # v2.3: cross-step hooks (Phase H3)
│   │   │   └── agent.yaml           # Component registry
│   │   ├── medical_director/    # Same layout per agent
│   │   ├── evidence_aggregation/
│   │   ├── classification/
│   │   ├── policy_evolution/
│   │   └── prompts/             # Shared PROMPT_REGISTRY
│   ├── api/                 # FastAPI application
│   ├── config/              # Settings and logging
│   ├── db/                  # Database, models, repository, migrations
│   ├── models/              # Pydantic domain models
│   ├── observability/       # v2.3: trajectory logging (Phase H0)
│   ├── orchestrator/        # 7-branch escalation tree
│   └── rag/                 # ChromaDB dual-collection pipeline
├── frontend/                # React 18 frontend
├── harness/                 # v2.3: harness engineering artifacts
│   └── manifests/               # Per-iteration change manifests + verdicts
│       ├── change_manifest.schema.json
│       ├── iter-0.json
│       └── iter-N-verdicts.json
├── tests/
│   ├── test_*.py            # 120 unit tests
│   └── eval/                # v2.3: harness benchmark (Phase H5)
├── demo/                    # 53-case synthesized demo dataset
├── docs/                    # Documentation
│   ├── ARCHITECTURE.md
│   ├── HARNESS.md           # v2.3: harness component reference
│   ├── DECISIONS.md         # v2.3: append-only change log with verdicts
│   ├── ITERATIONS.md        # v2.3: narrative log per iteration
│   ├── EVALUATION.md        # v2.3: benchmark methodology + scores
│   └── PACCA_PRD_v2.4_Consolidated.md  # Full PRD with phase specs
└── docker-compose.yml       # Full stack including Langfuse

Technology Stack

Layer Technology Notes
LLM Claude (Anthropic API), claude-sonnet-4 Tool-use forced for structured output
Backend Python 3.12, FastAPI, Pydantic v2 Fully async throughout
Production DB PostgreSQL 16, SQLAlchemy 2.0, Alembic JSONB compliance queries, async pool
Dev DB SQLite (same ORM layer) One env var to switch
Vector Store ChromaDB 0.5+, dual-collection Different trust levels per collection
Cache Redis (optional) 40–60% token reduction at scale (V2 release)
Frontend React 18, TypeScript, Tailwind CSS Vite build pipeline
Observability OpenTelemetry → Langfuse 1.27+ One span per agent call
Testing pytest, pytest-asyncio, pytest-cov 140 unit + benchmark suite
Security python-jose, bcrypt JWT + timing-safe passwords
Manifest validation jsonschema (Draft 2020-12) v2.4+: validates change manifests in CI
CI/CD GitHub Actions Includes manifest schema validation
Containerization Docker, Docker Compose 6 services in full stack

Documentation Map

PACCA's documentation is structured to serve four audiences: engineers, healthcare reviewers, recruiters and the agentic AI community evaluating the work, and future iterations of PACCA itself.

Core architecture and methodology

  • docs/ARCHITECTURE.md — system architecture, component responsibilities, request lifecycle
  • docs/HARNESS.md — harness layer reference: 11 editable surfaces, three rules of engagement, three observability pillars
  • docs/PACCA_PRD_v2.4_Consolidated.md — full Product Requirements Document, including the harness engineering cycle phases (H0–H5) and §16 Clinical Validation Strategy

Iteration record (v2.4+)

  • docs/DECISIONS.md — append-only log of every behavioral change with predictions and verdicts
  • docs/ITERATIONS.md — narrative log per iteration tag (paper Appendix C format)
  • docs/EVALUATION.md — benchmark methodology, scores, regression history
  • CHANGELOG.md — per-iteration changelog with eval delta and verified predictions

Machine-readable specifications

Governance framework (external)

  • CRISP-AG White Paper v2.3CRISP-AG: An Artifact-Centered Framework for Enterprise Agentic AI Governance. Specifies the four implementation artifacts (Delegation Authority Scoping, Contractor Access Governance, Orchestration Contract, Capability Frontier Classification) and nine-phase lifecycle that PACCA's harness engineering implements at the code layer. Sits beneath ISO/IEC 42001 and NIST AI RMF.

About the Author

David Reed, Ph.D. — Head of AI/ML & Agentic Delivery at Interview Kickstart. PhD in Computer Science, MBA, PMP, Wharton AI Fellow. Holder of US Patent 6,850,988 — the foundational recommendation-engine architecture developed at Oracle and later widely deployed in commerce. Formerly Master Technologist at Hewlett-Packard (Distinguished/Principal-IC track) and Principal TPM-AI at Microsoft. 35+ years across data warehousing, enterprise AI/ML, and edtech, including leading a $70M data-science curriculum portfolio across R1 universities.

I built PACCA to demonstrate end-to-end agentic AI engineering on a high-stakes, regulated domain — healthcare prior authorization — where correctness, explainability, human oversight, and observability all matter equally. Beginning with v2.3, the project commits to a falsifiable harness-engineering methodology adapted from Lin et al. (arXiv:2604.25850, 2026) — every behavioral change ships as a one-file diff with a recorded prediction, and the next evaluation round verifies or rejects it at file granularity. The discipline is a concrete instance of the CRISP-AG Orchestration Contract artifact applied to a regulated healthcare domain.

Portfolio · LinkedIn · drdgreed@gmail.com


Contributing

Contributions are welcome. PACCA's contribution model has two paths:

  • Standard PRs — refactors, documentation, infra, dependency bumps, non-behavioral fixes.
  • Behavioral PRs (harness-engineering discipline) — anything that changes how an agent reasons, what tools it can call, what middleware fires, or what memory context it sees. Requires a one-file diff plus a manifest entry under harness/manifests/.

Full details on local setup, the two-path workflow, the manifest schema, and the predicted-vs-observed verdict cycle are in CONTRIBUTING.md. Security-related findings should follow SECURITY.md instead — please do not open a public issue.

By contributing you agree to the Code of Conduct.


Citation

If you reference PACCA's harness engineering implementation in academic work or production case studies, please cite:

Reed, D. (2026). PACCA: Prior Authorization & Care Coordination Agent Platform —
v2.4 Consolidated PRD. github.com/drdgreed/pacca.

Methodology adapted from:
Lin, J., Liu, S., Pan, C., Lin, L., Dou, S., Huang, X., Yan, H., Han, Z., & Gui, T. (2026).
Agentic Harness Engineering: Observability-Driven Automatic Evolution of
Coding-Agent Harnesses. arXiv:2604.25850v3.

License

MIT — see LICENSE for details.


Acknowledgments

  • Built with Claude by Anthropic
  • Methodology informed by Lin et al., Agentic Harness Engineering (arXiv:2604.25850, 2026)
  • Clinical guidelines based on publicly available NCCN, ACR, AHA, ADA, and CMS guidance
  • Inspired by real-world healthcare prior authorization challenges affecting 200+ million patients annually

PACCA v2.4 — Healthcare Prior Authorization, Iterated Like Engineering github.com/drdgreed/pacca | David Reed, PhD | May 2026

About

Multi-agent AI platform that automates healthcare prior-authorization decisions. RAG-grounded, harness-engineered, observability-first.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors