feat: LLM review stage for enhanced reachability detection#50
feat: LLM review stage for enhanced reachability detection#50joshbouncesecurity wants to merge 4 commits into
Conversation
Adds an opt-in LLM review stage (off by default, enabled via the new `--llm-reachability` flag on `openant scan`) that uses a strong model (Opus by default) to surface additional reachability signals beyond what the structural pass catches: - Likely entry points the structural analysis may miss (framework hooks, plugin/CLI registrations, message handlers). - External-input sites (HTTP request bodies, file/network reads, env/argv, stdin, untrusted IPC). - Cross-process / async data-flow indicators. Signals are advisory and *promote-only*: high-confidence entry-point signals can set `is_entry_point=True` on a unit, but no signal ever demotes a unit that the structural analysis already kept. This matches the "complements, does not replace" intent in issue #17. Output: - `llm_reachability.json` written to the scan dir with the full signal list. - Each unit gains an `llm_reachability_signals` array on the dataset. Cost & rate-limit safety: opt-in only, prompts are batched, and the client integration goes through the existing `AnthropicClient` (which respects `GlobalRateLimiter`). Refs #17.
The Python CLI defines --llm-reachability for the LLM reachability stage (issue #17), but the Go CLI proxy did not expose it. The test TestHelp::test_scan_help_advertises_llm_reachability inspects 'openant scan --help' (Go cobra output) and was failing on all 3 OS targets. Register --llm-reachability as a Bool flag on the Go scan command and pass it through to the Python invocation when set.
- scanner.py: forward-declare app_context_path before step 1.5 so the LLM reachability block doesn't hit a NameError when --llm-reachability is enabled (the block ran before the app-context step that defined it). - llm_reachability._chunk: non-positive batch_size used to reference an unbound loop variable; now collapses to a single batch covering all items. Adds a regression test. - Help text (Python CLI + Go CLI): note that --llm-reachability may incur additional LLM cost, per cost-safety review.
The LLM reachability stage threads app_context into its prompt to help the model reason about expected entry points (web_app vs cli_tool, etc). The previous ordering ran it before app-context generation, so the app_context_path was always None at the call site — the prompt threading silently no-op'd. Reordering the steps makes the threading actually work. This also retires the temporary forward-declaration introduced in the previous commit; app_context_path is now defined naturally by the preceding step before the LLM reachability block reads it.
Manual verificationOff by default. Requires API key. Cost note: enabling adds approximately one Opus call per 25 units.
|
Local test resultsReinstalled openant-core from this branch and ran Commands run: Outcome (against the manual-verification checklist):
Total cost: $0.024 (reachability $0.012 + analyze 1 unit $0.012). Well under the budget. |
|
🔴 High — Architecture · Issue: The LLM stage runs after What the stage does still deliver in current form:
What it cannot do:
|
Summary
Adds a new LLM review stage (off by default, enabled via
--llm-reachabilityonopenant scan) that uses a strong model (Opus by default) to surface additional reachability signals beyond what the structural analysis catches:Signals are advisory and only promote a unit's reachability — they never demote one that the structural analysis already kept. This matches the issue's "complements, not replaces" intent.
Output:
llm_reachability.jsonin the scan dir with the full signal list.llm_reachability_signals: [...]field on each unit indataset.json.entry_pointsignals setis_entry_point: trueon the target unit.Cost & rate-limit safety: reuses the existing
GlobalRateLimiterviaAnthropicClient, opt-in only, and prompts are batched (default 25 units/call).Addresses #17 (does not close — let the maintainer review the prompt + heuristics first).
Test plan
analyze_reachabilitywith mocked LLM: fixed JSON, malformed JSON, exception in client, app_context threading, batch chunking.apply_signals: promote-only semantics (high-confidence promotes, medium does not, never demotes), per-unit signal accumulation, unknown-id rejection.--llm-reachabilityappears inopenant scan --help; default does not pass the flag through toscan_repository; setting it threadsllm_reachability=True.