Orchestrator for conducting cloud LLM vulnerability research (Codex x Claude). Primarily targets memory corruption bugs on targets that are aligned with ZDI's scope.
On initial start, the agents will attempt to:
- Find applicable GH repos/OSS libraries found in products taken into consideration by ZDI
- Conduct research on historical vulnerabilties/CVEs to understand where previous problems have lived, and if that could be applied to new vulnerabilties.
Not perfect by any means, but it's done a decent job at replicating some of the manual work I've done.
Built for macOS on Tahoe 26.3.
./setup.sh && ./restart.sh && python3 dashboard.pyqueued → scoping → researching → poc → fuzzing → validating → approving → packaging → done
PoCs are written by Codex as Python proof artifacts. Fuzzing builds a harness bundle per target, triages crashes, and dedupes through finding_exclusions so the same bug doesn't bounce through the queue twice.
- config.yaml
- Persistent defaults for agents
- Enable/disable agents
- Per-role model defaults (
claude_plan,claude_execute,codex) + effort tiers - Prioritize target & vulnerability types
- Adjust minimum CVSS threshold
- Fuzzing knobs (
duration_seconds,triage_limit,seed_sample_limit)
- control.yaml
- Used for adjusting agents mid-run
- Adjust active focus areas
- Disregard/kill findings
- Override per-role models during runtime
debug_mode- periodic pipeline-state health audit- Emergency stop (
stop: true)
target_categories: iot, routers, network, security-libs, ai-ml
zdi_priority: remote_code_execution, enterprise_software, server_side, os, browser, scada_iiot, sandbox_escape, vm_escape, security_product
- Allow for Ollama/local usage
- Improve logging (still rough to see in detail what the agents are doing)
- Clean up all the weird bugs
