Skip to content

ret2c/llm-vr-beta

Repository files navigation

llm-vr-beta

Orchestrator for conducting cloud LLM vulnerability research (Codex x Claude). Primarily targets memory corruption bugs on targets that are aligned with ZDI's scope.
On initial start, the agents will attempt to:

  1. Find applicable GH repos/OSS libraries found in products taken into consideration by ZDI
  2. Conduct research on historical vulnerabilties/CVEs to understand where previous problems have lived, and if that could be applied to new vulnerabilties.

Not perfect by any means, but it's done a decent job at replicating some of the manual work I've done.
Built for macOS on Tahoe 26.3.

Quickstart:

./setup.sh && ./restart.sh && python3 dashboard.py

Pipeline phases:

queued → scoping → researching → poc → fuzzing → validating → approving → packaging → done
PoCs are written by Codex as Python proof artifacts. Fuzzing builds a harness bundle per target, triages crashes, and dedupes through finding_exclusions so the same bug doesn't bounce through the queue twice.

YAML Settings:

  • config.yaml
    • Persistent defaults for agents
    • Enable/disable agents
    • Per-role model defaults (claude_plan, claude_execute, codex) + effort tiers
    • Prioritize target & vulnerability types
    • Adjust minimum CVSS threshold
    • Fuzzing knobs (duration_seconds, triage_limit, seed_sample_limit)
  • control.yaml
    • Used for adjusting agents mid-run
    • Adjust active focus areas
    • Disregard/kill findings
    • Override per-role models during runtime
    • debug_mode - periodic pipeline-state health audit
    • Emergency stop (stop: true)

Default Categories:

target_categories: iot, routers, network, security-libs, ai-ml

zdi_priority: remote_code_execution, enterprise_software, server_side, os, browser, scada_iiot, sandbox_escape, vm_escape, security_product


image

Todo:

  • Allow for Ollama/local usage
  • Improve logging (still rough to see in detail what the agents are doing)
  • Clean up all the weird bugs

About

Orchestrator for conducting vulnerability research via cloud LLMs

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors