WIP: Plan CodeQL sandbox recipe refactor by pruiz · Pull Request #45 · pruiz/CodeCome

pruiz · 2026-06-05T20:58:06Z

Summary

Adds a detailed WIP implementation plan for refactoring the CodeQL integration around sandbox-generated build recipes.

The plan proposes:

reordering Phase 1 to run sandbox bootstrap before CodeQL;
adding itemdb/notes/sandbox-recipe.yml as the machine-readable contract between sandbox and CodeQL;
keeping codeql-plan.yml as the CodeQL analysis-intent document;
supporting multi-unit CodeQL through sandbox build targets rather than multiple sandboxes;
adding CodeQL run health classification to avoid false successful runs;
moving to per-run CodeQL artifact directories to prevent stale SARIF/normalized output reuse;
updating prompts, gates, docs, tests, and subphase names consistently.

Scope

This PR is intentionally documentation-only for review. No implementation code is changed yet.

Review focus

Please review especially:

The new Phase 1 order.
The proposed sandbox-recipe.yml schema.
The relationship between codeql-plan.yml analysis units and sandbox build targets.
The CodeQL health/usability rules.
The implementation sequence and acceptance criteria.

Summary by CodeRabbit

New Features
- Sandbox recipe machine-readable contracts for Phase 1b automation
- CodeQL health classification system to determine run result usability
- CodeQL execution inside Docker containers for improved isolation
Refactoring
- Phase 1 workflow restructured into Phase 1a/1b/1c sub-stages
- CodeQL plan schema updated to version 2 with sandbox integration
- Removed automatic CodeQL plan repair subsystem
Improvements
- Conditional CodeQL signal handling based on health status in Phase 1c

coderabbitai · 2026-06-05T20:58:12Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: d5cf6285-5046-4ac5-a14c-90bb89da7d5d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch wip/codeql-sandbox-recipe-refactor

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-05T20:59:50Z

Coverage Report

Metric	Value
Line Coverage	76.5%
Lines Covered	0 / 0

Download detailed HTML coverage reports per OS/Python from the workflow artifacts.

Generated by pytest-cov on 2026-06-06T21:48:43.929Z

Add tools/sandbox/recipe.py with load_recipe, validate_recipe, dump_recipe. Add recipe-validate and recipe-print subcommands to sandbox-bootstrap.py. Add templates/sandbox-recipe.yml.example (schema v1). Add tests/test_sandbox_recipe.py (24 tests covering valid/invalid schemas). Update docs/sandbox.md and templates/sandboxes/README.md with recipe docs. Update .project/codeql-sandbox-recipe-refactor-plan.md with locked plan.

tools/codeql/packs.py: load_codeql_plan requires schema_version: 2, v1 raises PackResolverError with clear upgrade guidance. templates/codeql-plan.yml: updated to v2 schema with build_provider, sandbox_build_target, and new commentary. tools/phases/phase_1_gates.py: gate 1a now calls load_codeql_plan() so v1 plans are rejected at the gate with the upgrade error message. prompts/phase-1a-profile.md: updated to v2 rules, including sandbox_build_target, build_provider, and new language about sandbox being built in Phase 1b. Tests: all plan fixtures bumped to v2, new test_schema_v1_rejected_at_gate_1a.

Rename prompts/phase-1c-sandbox.md -> prompts/phase-1b-sandbox.md. Update content to reflect new position (second sub-stage instead of third). Add sandbox-recipe.yml as a first-class required output alongside sandbox-plan.md. Add recipe schema rules, per-target guidance, and validation instructions. Update test_phase_1c_reads_threat_model to use the renamed prompt file.

tools/codeql/health.py: compute_health() with classifications covering disabled, skipped, failed, extraction-failed, analysis-failed, completed-empty-valid, completed-with-signals, and more. tools/codeql/pipeline.py: generate run_id (UTC timestamp + hash), create per-run directory under itemdb/codeql/runs/<id>/, write last-run-manifest.yml and current-run.txt, compute health before importing risk signals. tools/codeql/runner.py: accept optional run_dir parameter; write SARIF, databases, and selected-query-packs.yml into per-run dir. tools/codeql/artifacts.py: prefer last-run-manifest.yml, fall back to run-manifest.yml; consume health block for usability gating. tests/test_codeql_health.py: 11 tests covering all health classifications. tests/test_codeql_pipeline.py: updated for per-run layout. tests/test_codeql_artifacts.py: updated for manifest fallback.

tools/codeql/platform.py: host_platform() and container_platform() for detecting OS/arch mismatch between host and sandbox container. tools/codeql/in_docker.py: check_platform() and exec_codeql() for running CodeQL inside a Docker Compose service. Platform guard classifies units as unavailable when host (e.g. Darwin) and container (e.g. Linux) platforms differ under mount-host-bundle strategy. templates/sandboxes/_shared/codeql.sh: wrapper script invoked by the harness, resolving compose file + service from env or defaults. templates/sandboxes/*/docker-compose.yml: added read-only bind mount of host .tools/codeql/current at /opt/codeql inside every container. tests/test_codeql_platform.py: 7 tests for platform detection. tests/test_codeql_in_docker.py: 6 tests for Docker executor.

… repair Phase 1 orchestra tor reordered: 1a: Target Profile → gate check_phase_1a 1b: Sandbox Bootstrap → gate check_phase_1c CodeQL runs after sandbox (post-sandbox) 1c: Detailed Reconnaissance → gate check_phase_1b Removed _run_codeql_repair_if_needed and all CodeQL repair helpers. Removed codeql-plan auto-repair from subphase retry loop. Removed build_codeql_plan_resume_prompt and build_codeql_build_failure_resume_prompt from completion.py. Stopped referencing prompts/phase-1-codeql-repair.md (file kept in tree until cleanup). Renamed prompts/phase-1b-recon.md → prompts/phase-1c-recon.md. Updated gate messages and test references. Skipped 17 tests pending deletion/update in commits 7-8.

prompts/phase-1c-recon.md: updated to Phase 1c, added health-aware CodeQL artifact reading rules (read last-run-manifest.yml, skip signals when health.usable is false, record health summary in threat-model.md). docs/workflow.md: rewritten Phase 1 section with the new 1a→1b→1c order. Phase 1a produces target profile + build model + codeql plan. Phase 1b produces sandbox + sandbox-recipe.yml. Phase 1c produces full recon notes enriched with CodeQL when usable. Makefile help text: Sandbox bootstrap (Phase 1c) → (Phase 1b). .opencode/skills/sandbox-bootstrap/SKILL.md and .opencode/agents/recon.md already reference Phase 1b as sandbox bootstrap — aligned with new order.

Delete prompts/phase-1-codeql-repair.md and tests/test_phase_1_codeql_plan_repair.py. Remove three skipped repair-related tests from test_codecome_check_codeql.py. Full test suite: 730 passed, 0 failures.

Copilot

Pull request overview

This PR advances the planned “CodeQL + sandbox recipe refactor” by introducing the sandbox recipe contract + validator, reordering Phase 1 (sandbox bootstrap before CodeQL, detailed recon after), and reshaping CodeQL execution artifacts around per-run directories with health classification.

Changes:

Add itemdb/notes/sandbox-recipe.yml schema example plus Python loader/validator and CLI helpers (recipe-validate, recipe-print).
Reorder Phase 1 to 1a → 1b (sandbox) → CodeQL → 1c (recon) and update gates/prompts/docs/tests accordingly.
Introduce per-run CodeQL output layout (runs/<run-id>/...), last-run-manifest.yml, current-run.txt, and a health model used by artifact gating and recon prompts.

Reviewed changes

Copilot reviewed 54 out of 55 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tools/sandbox/recipe.py	New loader/validator for `sandbox-recipe.yml`.
tools/sandbox/init.py	Establish `sandbox` as an importable package.
tools/sandbox-bootstrap.py	Add `recipe-validate` / `recipe-print` subcommands and constants for recipe path.
tools/phases/phase_1_gates.py	Switch CodeQL plan loading to shared loader; update Phase 1 messaging for new order.
tools/phases/completion.py	Remove CodeQL plan/build failure resume prompt helpers tied to the old repair flow.
tools/codeql/runner.py	Add optional `run_dir` support to write artifacts per-run (legacy mode preserved).
tools/codeql/platform.py	New host/container platform detection helpers.
tools/codeql/pipeline.py	Create per-run directories, write `last-run-manifest.yml`/`current-run.txt`, compute health, gate risk import.
tools/codeql/packs.py	Enforce CodeQL plan `schema_version: 2` with actionable upgrade error.
tools/codeql/in_docker.py	New helpers for CodeQL execution inside Docker Compose services + platform guard.
tools/codeql/health.py	New health classification model for “usable” vs “unusable” CodeQL runs.
tools/codeql/artifacts.py	Gate artifacts based on `health` when present; support last-run manifest.
tools/codecome/phase_1.py	Reorder Phase 1 execution; remove prior CodeQL repair loop and associated validation logic.
tests/test_sandbox_recipe.py	New unit tests for recipe load/validation rules.
tests/test_phase_failure_state_reset.py	Update prompt path usage consistent with new Phase 1 prompt naming.
tests/test_phase_1_prompts_threat_model.py	Update prompt existence/content checks for renamed Phase 1 prompts.
tests/test_phase_1_mid_turn_forgiveness.py	Update prompt path usage consistent with new Phase 1 prompt naming.
tests/test_phase_1_gates.py	Update CodeQL plan fixtures to schema v2; add test that v1 is rejected.
tests/test_phase_1_codeql_plan_repair.py	Remove tests for deleted CodeQL repair flow.
tests/test_codeql_runner.py	Update CodeQL plan fixtures to schema v2 where needed.
tests/test_codeql_platform.py	New tests for platform detection utilities.
tests/test_codeql_pipeline.py	Update pipeline tests for per-run layout + last-run manifest behavior.
tests/test_codeql_packs.py	Update pack/plan fixtures for CodeQL plan schema v2.
tests/test_codeql_in_docker.py	New tests for Docker execution helpers and platform guard behavior.
tests/test_codeql_health.py	New tests for health classification outcomes.
tests/test_codeql_artifacts.py	Update assertions to match new “manifest missing” wording.
tests/test_codecome_check_codeql.py	Update plan schema fixture (v2) and remove repair-related tests.
templates/sandboxes/web-static/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/rust/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/ruby/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/python/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/php/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/node/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/nested-virt/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/multi-service-compose/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/java-maven/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/iac-terraform/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/go/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/generic/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/erlang-otp/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/dotnet/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/c-cpp/docker-compose.yml	Mount host CodeQL bundle into sandbox container.
templates/sandboxes/_shared/codeql.sh	New shared wrapper for invoking CodeQL via `docker compose exec`.
templates/sandboxes/README.md	Document the new sandbox recipe artifact and CLI helpers.
templates/sandbox-recipe.yml.example	Add example schema for the sandbox recipe contract.
templates/codeql-plan.yml	Bump CodeQL plan template to schema v2 and document sandbox mapping fields.
prompts/phase-1c-recon.md	Update recon prompt to Phase 1c and use health-aware artifact consumption rules.
prompts/phase-1b-sandbox.md	Update sandbox bootstrap prompt to Phase 1b and require `sandbox-recipe.yml`.
prompts/phase-1a-profile.md	Update Phase 1a prompt rules for CodeQL plan schema v2.
prompts/phase-1-codeql-repair.md	Remove obsolete repair prompt (no longer used).
Makefile	Update help text to reflect sandbox bootstrap as Phase 1b.
docs/workflow.md	Update workflow docs for 3-substage Phase 1 and new outputs.
docs/sandbox.md	Document sandbox recipe artifact and how to validate/print it.
.project/codeql-sandbox-recipe-refactor-plan.md	Add/extend detailed implementation plan document.
.gitignore	Ignore `token-usage-output.txt`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+Branch purpose: define the target architecture and implementation plan before changing runtime code.
+
+## 1. Problem statement


+def _generate_run_id() -> str:
+    ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+    fingerprint = hashlib.sha256(ts.encode()).hexdigest()[:8]
+    return f"{ts}-{fingerprint}"


    out.separator(tone=T.SUCCESS)
-    out.success("Ready to run Phase 1c (Sandbox Bootstrap).")
+    out.success("Ready to run Phase 1c (Detailed Reconnaissance).")
    return 0


+def test_host_platform_known_os() -> None:
+    plat = host_platform()
+    assert any(kw in plat.lower() for kw in ("linux", "darwin", "windows"))
+


+            sandbox_path = Path(root) / sandbox_path_str
+            if not sandbox_path.exists():
+                errors.append(
+                    f"sandbox-recipe.yml: sandbox.path {sandbox_path_str!r} does not exist"
+                )


+    manifest.setdefault("health", health)
+


+        result = subprocess.run(
+            ["uname", "-sm"], capture_output=True, text=True, timeout=10
+        )
+        return result.stdout.strip()


+            ],
+            capture_output=True, text=True, timeout=timeout,
+        )
+        return result.stdout.strip() or result.stderr.strip()


+            f"CodeQL bundle is for {host_plat}; sandbox service "
+            f"{service!r} runs {container_plat}. "
+            "install_strategy=mount-host-bundle cannot cross platforms. "
+            "Use install_strategy=download-in-container or image-preinstalled "
+            "(not yet supported)."


tests/test_prompts.py: add test_no_stale_phase_1b_1c_references_in_prompts that greps all prompt files for contradictory Phase 1b/1c patterns, plus self-consistency tests for phase-1b-sandbox.md and phase-1c-recon.md. prompts/README.md: update filenames and descriptions to new 1b/1c order. prompts/sweep.md: add 'CodeQL signals (conditional)' section instructing the sweep to check health.usable before importing CodeQL signals. prompts/phase-6-report.md: add health-aware rule in 'Reporting rules' section instructing the reporter to check last-run-manifest.yml health.

coderabbitai

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

prompts/phase-1b-sandbox.md (1)
138-138: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix sub-stage reference in retry-exhaustion instruction.

This line says “stop Phase 1c” inside the Phase 1b prompt. It should refer to stopping Phase 1b (or stopping Phase 1 progression) to avoid contradictory operator guidance.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@prompts/phase-1b-sandbox.md` at line 138, Update the incorrect sub-stage
reference in the retry-exhaustion instruction so it no longer tells the operator
to "stop Phase 1c"; change the phrase to "stop Phase 1b" (or "stop Phase 1
progression") in the sentence that mentions retry budget and logging to
sandbox-plan.md and references CODECOME_BOOTSTRAP_MAX_RETRIES, ensuring the
instruction consistently refers to Phase 1b.
prompts/phase-1c-recon.md (1)
178-180: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix stale CodeQL artifact path and stale Phase filename in Phase 1c prompt.

This prompt still contains pre-refactor references that conflict with the new Phase 1c/per-run model: Line 178 points to itemdb/codeql/normalized/file-signals.yml instead of the per-run location, and Line 274 still writes phase-1b-summary during Phase 1c. Both can cause incorrect file reads/writes by the agent.
🛠️ Proposed doc fix
-If CodeQL file signals exist (`itemdb/codeql/normalized/file-signals.yml`), incorporate them:
+If CodeQL file signals exist (`itemdb/codeql/runs/<run-id>/normalized/file-signals.yml`), incorporate them:
@@
-    runs/phase-1b-summary-YYYY-MM-DD-HHMMSS.md
+    runs/phase-1c-summary-YYYY-MM-DD-HHMMSS.md
Also applies to: 274-274
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@prompts/phase-1c-recon.md` around lines 178 - 180, Update the stale CodeQL
artifact path and the Phase filename in the prompt: replace references to the
legacy static path "itemdb/codeql/normalized/file-signals.yml" with the per-run
CodeQL artifact location used by the new Phase 1c model (the per-run path your
pipeline writes for CodeQL file signals), and change the summary output filename
from "phase-1b-summary" to "phase-1c-summary" so the agent reads/writes the
correct per-run artifacts; search for those exact strings in the prompt text and
update them accordingly.
tests/test_codeql_pipeline.py (1)
212-253: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Duplicate test name overrides the earlier test case.

Line 253 redefines the same function name from Line 212, so the first test is never executed.
Suggested fix
-def test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy(tmp_path: Path) -> None:
+def test_pipeline_normalize_failure_sets_expected_last_manifest_for_soft_policy(tmp_path: Path) -> None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_codeql_pipeline.py` around lines 212 - 253, The file defines
test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy twice, causing
the first definition to be overridden; rename the duplicate test function to a
unique, descriptive name (for example change the second occurrence to
test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy_duplicate or a
name reflecting its intent) so both tests are distinct, and update any
references or fixtures if needed to match the new function name (look for the
symbol test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy in this
diff to locate both definitions).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.project/codeql-sandbox-recipe-refactor-plan.md:
- Around line 857-859: The doc uses a non-canonical key `Sandbox_build_target`
(capital S) instead of the agreed canonical `sandbox_build_target`, causing
copy/paste drift; update the text at the occurrence of `Sandbox_build_target` to
use `sandbox_build_target` so it matches the schema and examples (search for the
token `Sandbox_build_target` and replace with `sandbox_build_target` in the
description of Multi-target invocation).

In `@Makefile`:
- Line 109: You changed the Makefile (the printf line that prints "Sandbox
bootstrap (Phase 1b)"), which is disallowed; revert that modification so the
Makefile remains exactly as in main (remove that hunk from your commit or reset
the file), then amend the PR to exclude any Makefile/orchestration/config edits;
reference the changed printf line to locate and undo the change and ensure only
allowed source files are included in the branch before pushing.

In `@prompts/phase-1a-profile.md`:
- Around line 68-73: The doc contradicts itself about requiring build_command:
clarify that when build_provider is "sandbox-recipe"
analysis_units[].sandbox_build_target should point to the sandbox-recipe.yml
build_targets[].id and build_command must be left empty even if build_mode is
"manual"; update the guidance so that build_command is only required when
build_provider is "none" or when build_mode is "manual" and build_provider is
not "sandbox-recipe" (i.e., for manual, non-recipe builds supply build_command),
and state that codeql-plan.yml should never include a concrete build_command for
units using sandbox-recipe (use sandbox_build_target instead).

In `@prompts/sweep.md`:
- Around line 24-27: Update the prompt to use per-run CodeQL paths: read the run
id from itemdb/codeql/last-run-manifest.yml (as before) and then reference
enrichment files under itemdb/codeql/runs/<run_id>/normalized/alerts.yml and
itemdb/codeql/runs/<run_id>/normalized/file-signals.yml instead of the old
top-level itemdb/codeql/normalized/... paths; ensure any mention of
"normalized/alerts.yml" or "file-signals.yml" in this prompt is replaced with
the per-run path pattern using the manifest's run_id.

In `@tests/test_phase_1_prompts_threat_model.py`:
- Around line 64-66: The test test_phase_1c_reads_threat_model is reading the
wrong fixture ("phase-1b-sandbox.md"); change the call to _read_prompt to open
the Phase 1c recon prompt (e.g., "phase-1c-recon.md") so the test actually
verifies the Phase 1c recon requirement, leaving the assert ("threat-model.md"
in content) as-is; update the string literal in the test function
test_phase_1c_reads_threat_model to the correct prompt filename.

In `@tests/test_prompts.py`:
- Around line 30-53: The regex checks in stale_patterns can miss matches
spanning multiple lines because re.search is called without DOTALL; update the
search to allow '.' to match newlines (e.g., use re.DOTALL / re.S) so patterns
like "Phase 1b.*Detailed Reconnaissance" will match across line breaks. Locate
the loop that iterates prompts_dir.glob("*.md") and the call to
re.search(pattern, content, re.IGNORECASE) and change the flags to re.IGNORECASE
| re.DOTALL (or compile the patterns with re.S) while preserving the existing
exceptions logic and pattern strings.

In `@tests/test_sandbox_recipe.py`:
- Around line 81-85: Replace the use of the bare `assert False, "Expected
ValueError"` in the exception tests that call load_recipe(path) with an explicit
exception raise to avoid removal under optimized execution; specifically, change
those failure branches to `raise AssertionError("Expected ValueError")` (and
apply the same replacement for the similar block around the second exception
test) so the test still fails clearly when no ValueError is raised.

In `@tools/codeql/health.py`:
- Around line 183-191: The current logic returns "skipped" whenever
has_languages is false even if status == "failed", hiding real failures; update
the conditional so that failures are preserved—either move the status ==
"failed" block above the has_languages check or change the has_languages branch
to only return "skipped" when status != "failed"; make sure to still use db_ok
and analyze_ok (the existing variables) to return the appropriate "failed"
message ("CodeQL database creation failed.", "CodeQL analysis failed.", or
"CodeQL pipeline failed.").

In `@tools/codeql/in_docker.py`:
- Around line 25-37: The guard only treats "mount-host-bundle" as a host-bundle
case but misses "copy-host-bundle", so update the conditional that checks
install_strategy to include "copy-host-bundle" (e.g., install_strategy not in
("mount-host-bundle", "copy-host-bundle")) and ensure the subsequent validation
uses host_platform(), container_platform(service, compose_file), and
platforms_compatible() to reject cross-platform combinations with an error
message that mentions both mount-host-bundle and copy-host-bundle as unsupported
for cross-platform installs.

In `@tools/codeql/pipeline.py`:
- Around line 16-19: The _generate_run_id function currently derives fingerprint
solely from ts, causing identical run IDs for starts within the same second;
update _generate_run_id to add non-deterministic entropy (e.g., include
microseconds, process id, or a random value such as uuid.uuid4().hex or
secrets.token_hex) into the hashed input or as part of the fingerprint so each
invocation is unique; keep the return format f"{ts}-{fingerprint}" and update
the fingerprint calculation in _generate_run_id to hash timestamp plus the
chosen random/entropy source (or replace hashing with a truncated uuid4) to
avoid collisions.

In `@tools/codeql/platform.py`:
- Around line 15-20: The probe currently returns result.stdout.strip() even when
the command fails; change the subprocess.run invocation handling in the probe
(the call that uses ["uname", "-sm"] and the similar probe at lines 31-40) to
treat non-zero exit codes as failures: after subprocess.run(...) check
result.returncode and if it's non-zero (or result.stdout is empty/unreliable)
return "unknown" instead of returning stdout/stderr; update the code paths
around the subprocess.run call and the existing return result.stdout.strip() so
both probes consistently return "unknown" on non-zero exits or empty output.

In `@tools/sandbox-bootstrap.py`:
- Around line 1562-1606: The cmd_recipe_validate and cmd_recipe_print functions
implement business logic in the top-level tools/sandbox-bootstrap.py; per policy
this file must be a thin delegating wrapper—move the implementation into a
package module (e.g., create tools.sandbox.recipe_commands or similar) and keep
these functions as one-line delegators that import and call the package
functions (e.g., recipe_commands.validate_recipe_cmd and
recipe_commands.print_recipe_cmd) passing through args; ensure all imports and
heavy logic (load_recipe, validate_recipe, dump_recipe, _emit, error handling
and printing) live in the new package module and update the wrapper functions
cmd_recipe_validate and cmd_recipe_print to simply import and return the package
function results.

In `@tools/sandbox/recipe.py`:
- Around line 175-192: The helper _validate_codeql_hints currently skips
validating the top-level codeql key default_execution_mode; add a validation
block (similar to the preferred_mode check) that reads
codeql.get("default_execution_mode"), ensures it's a string and one of
VALID_EXECUTION_MODES, and appends an error with the same format
(f"sandbox-recipe.yml: {prefix}.default_execution_mode {value!r} invalid
(allowed: {valid})") when invalid so validate_recipe() will catch bad
default_execution_mode values.

---

Outside diff comments:
In `@prompts/phase-1b-sandbox.md`:
- Line 138: Update the incorrect sub-stage reference in the retry-exhaustion
instruction so it no longer tells the operator to "stop Phase 1c"; change the
phrase to "stop Phase 1b" (or "stop Phase 1 progression") in the sentence that
mentions retry budget and logging to sandbox-plan.md and references
CODECOME_BOOTSTRAP_MAX_RETRIES, ensuring the instruction consistently refers to
Phase 1b.

In `@prompts/phase-1c-recon.md`:
- Around line 178-180: Update the stale CodeQL artifact path and the Phase
filename in the prompt: replace references to the legacy static path
"itemdb/codeql/normalized/file-signals.yml" with the per-run CodeQL artifact
location used by the new Phase 1c model (the per-run path your pipeline writes
for CodeQL file signals), and change the summary output filename from
"phase-1b-summary" to "phase-1c-summary" so the agent reads/writes the correct
per-run artifacts; search for those exact strings in the prompt text and update
them accordingly.

In `@tests/test_codeql_pipeline.py`:
- Around line 212-253: The file defines
test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy twice, causing
the first definition to be overridden; rename the duplicate test function to a
unique, descriptive name (for example change the second occurrence to
test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy_duplicate or a
name reflecting its intent) so both tests are distinct, and update any
references or fixtures if needed to match the new function name (look for the
symbol test_pipeline_normalize_failure_marks_soft_failed_for_soft_policy in this
diff to locate both definitions).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1f7f88ac-4287-42f3-9df6-7c4db6983e79

📥 Commits

Reviewing files that changed from the base of the PR and between 3a10252 and b7505f4.

📒 Files selected for processing (59)

.gitignore
.project/codeql-sandbox-recipe-refactor-plan.md
Makefile
docs/sandbox.md
docs/workflow.md
prompts/README.md
prompts/phase-1-codeql-repair.md
prompts/phase-1a-profile.md
prompts/phase-1b-sandbox.md
prompts/phase-1c-recon.md
prompts/phase-6-report.md
prompts/sweep.md
templates/codeql-plan.yml
templates/sandbox-recipe.yml.example
templates/sandboxes/README.md
templates/sandboxes/_shared/codeql.sh
templates/sandboxes/c-cpp/docker-compose.yml
templates/sandboxes/dotnet/docker-compose.yml
templates/sandboxes/erlang-otp/docker-compose.yml
templates/sandboxes/generic/docker-compose.yml
templates/sandboxes/go/docker-compose.yml
templates/sandboxes/iac-terraform/docker-compose.yml
templates/sandboxes/java-maven/docker-compose.yml
templates/sandboxes/multi-service-compose/docker-compose.yml
templates/sandboxes/nested-virt/docker-compose.yml
templates/sandboxes/node/docker-compose.yml
templates/sandboxes/php/docker-compose.yml
templates/sandboxes/python/docker-compose.yml
templates/sandboxes/ruby/docker-compose.yml
templates/sandboxes/rust/docker-compose.yml
templates/sandboxes/web-static/docker-compose.yml
tests/test_codecome_check_codeql.py
tests/test_codeql_artifacts.py
tests/test_codeql_health.py
tests/test_codeql_in_docker.py
tests/test_codeql_packs.py
tests/test_codeql_pipeline.py
tests/test_codeql_platform.py
tests/test_codeql_runner.py
tests/test_phase_1_codeql_plan_repair.py
tests/test_phase_1_gates.py
tests/test_phase_1_mid_turn_forgiveness.py
tests/test_phase_1_prompts_threat_model.py
tests/test_phase_failure_state_reset.py
tests/test_prompts.py
tests/test_sandbox_recipe.py
tools/codecome/phase_1.py
tools/codeql/artifacts.py
tools/codeql/health.py
tools/codeql/in_docker.py
tools/codeql/packs.py
tools/codeql/pipeline.py
tools/codeql/platform.py
tools/codeql/runner.py
tools/phases/completion.py
tools/phases/phase_1_gates.py
tools/sandbox-bootstrap.py
tools/sandbox/__init__.py
tools/sandbox/recipe.py

💤 Files with no reviewable changes (3)

prompts/phase-1-codeql-repair.md
tests/test_phase_1_codeql_plan_repair.py
tools/phases/completion.py

coderabbitai · 2026-06-06T21:22:45Z

+7. **Multi-target invocation**: the runner resolves `Sandbox_build_target` from
+   the recipe per (analysis unit, language) pair and invokes CodeQL with the
+   resolved `build_command`. Identical commands across targets are allowed;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the canonical key name sandbox_build_target consistently.

Line 857 uses Sandbox_build_target (capital S), but the schema examples and surrounding contract use sandbox_build_target. This can cause implementation drift from copy/paste.

✏️ Proposed doc fix

-7. **Multi-target invocation**: the runner resolves `Sandbox_build_target` from +7. **Multi-target invocation**: the runner resolves `sandbox_build_target` from

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

7. **Multi-target invocation**: the runner resolves `Sandbox_build_target` from

the recipe per (analysis unit, language) pair and invokes CodeQL with the

resolved `build_command`. Identical commands across targets are allowed;

7. **Multi-target invocation**: the runner resolves `sandbox_build_target` from

the recipe per (analysis unit, language) pair and invokes CodeQL with the

resolved `build_command`. Identical commands across targets are allowed;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.project/codeql-sandbox-recipe-refactor-plan.md around lines 857 - 859, The doc uses a non-canonical key `Sandbox_build_target` (capital S) instead of the agreed canonical `sandbox_build_target`, causing copy/paste drift; update the text at the occurrence of `Sandbox_build_target` to use `sandbox_build_target` so it matches the schema and examples (search for the token `Sandbox_build_target` and replace with `sandbox_build_target` in the description of Multi-target invocation).

coderabbitai · 2026-06-06T21:22:45Z

 	@printf "    $(BOLD)make sandbox-test$(RESET)   Test the target inside the sandbox\n"
 	@printf "\n"
-	@printf "  $(BOLD)$(CYAN)Sandbox bootstrap (Phase 1c):$(RESET)\n"
+	@printf "  $(BOLD)$(CYAN)Sandbox bootstrap (Phase 1b):$(RESET)\n"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not modify Makefile without explicit instruction.

This PR changes Makefile, which violates the repository rule forbidding orchestration/config-file edits unless explicitly requested.

As per coding guidelines, "{codecome.yml,AGENTS.md,Makefile,.opencode/**}: Never modify project orchestration or configuration files ... unless explicitly instructed".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` at line 109, You changed the Makefile (the printf line that prints "Sandbox bootstrap (Phase 1b)"), which is disallowed; revert that modification so the Makefile remains exactly as in main (remove that hunk from your commit or reset the file), then amend the PR to exclude any Makefile/orchestration/config edits; reference the changed printf line to locate and undo the change and ensure only allowed source files are included in the branch before pushing.

Source: Coding guidelines

coderabbitai · 2026-06-06T21:22:45Z

+- For compiled languages (c-cpp, go, csharp, java-kotlin, swift) set `analysis_units[].sandbox_build_target` to the `build_targets[].id` from `sandbox-recipe.yml` that provides the build command for this unit. If the recipe has not been generated yet (this is Phase 1a), pick a sensible id such as `root` — Phase 1b will flesh out the recipe and the id can be updated if needed.
+- For each language, set `build_provider`:
+  - `"sandbox-recipe"` — for compiled languages whose build command should be resolved from `sandbox-recipe.yml` after Phase 1b. Leave `build_command` empty (the runner resolves it from the recipe).
+  - `"none"` — for no-build languages (python, javascript-typescript, ruby).
+- Avoid concrete build shell snippets in `build_command` unless the build is obvious and stable and no recipe is available. Prefer `build_provider: sandbox-recipe` for everything that needs a build.
 - For each language in each analysis unit, select the appropriate pack profiles:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve contradictory build_command guidance for manual compiled units.

Line 70 says to leave build_command empty for build_provider: sandbox-recipe, but Line 85 still requires build_command when build_mode is manual. This ambiguity can produce inconsistent codeql-plan.yml output.

Proposed wording tweak

- For each language, set `build_provider`: - - `"sandbox-recipe"` — for compiled languages whose build command should be resolved from `sandbox-recipe.yml` after Phase 1b. Leave `build_command` empty (the runner resolves it from the recipe). + - `"sandbox-recipe"` — for compiled languages whose build command should be resolved from `sandbox-recipe.yml` after Phase 1b. Leave `build_command` empty even when `build_mode` is `manual` (the runner resolves it from the recipe). - `"none"` — for no-build languages (python, javascript-typescript, ruby). -- Avoid concrete build shell snippets in `build_command` unless the build is obvious and stable and no recipe is available. Prefer `build_provider: sandbox-recipe` for everything that needs a build. +- Avoid concrete build shell snippets in `build_command` unless the build is obvious/stable and `build_provider` is not `"sandbox-recipe"`.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- For compiled languages (c-cpp, go, csharp, java-kotlin, swift) set `analysis_units[].sandbox_build_target` to the `build_targets[].id` from `sandbox-recipe.yml` that provides the build command for this unit. If the recipe has not been generated yet (this is Phase 1a), pick a sensible id such as `root` — Phase 1b will flesh out the recipe and the id can be updated if needed.

- For each language, set `build_provider`:

- `"sandbox-recipe"` — for compiled languages whose build command should be resolved from `sandbox-recipe.yml` after Phase 1b. Leave `build_command` empty (the runner resolves it from the recipe).

- `"none"` — for no-build languages (python, javascript-typescript, ruby).

- Avoid concrete build shell snippets in `build_command` unless the build is obvious and stable and no recipe is available. Prefer `build_provider: sandbox-recipe` for everything that needs a build.

- For each language in each analysis unit, select the appropriate pack profiles:

- For compiled languages (c-cpp, go, csharp, java-kotlin, swift) set `analysis_units[].sandbox_build_target` to the `build_targets[].id` from `sandbox-recipe.yml` that provides the build command for this unit. If the recipe has not been generated yet (this is Phase 1a), pick a sensible id such as `root` — Phase 1b will flesh out the recipe and the id can be updated if needed.

- For each language, set `build_provider`:

- `"sandbox-recipe"` — for compiled languages whose build command should be resolved from `sandbox-recipe.yml` after Phase 1b. Leave `build_command` empty even when `build_mode` is `manual` (the runner resolves it from the recipe).

- `"none"` — for no-build languages (python, javascript-typescript, ruby).

- Avoid concrete build shell snippets in `build_command` unless the build is obvious/stable and `build_provider` is not `"sandbox-recipe"`.

- For each language in each analysis unit, select the appropriate pack profiles:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@prompts/phase-1a-profile.md` around lines 68 - 73, The doc contradicts itself about requiring build_command: clarify that when build_provider is "sandbox-recipe" analysis_units[].sandbox_build_target should point to the sandbox-recipe.yml build_targets[].id and build_command must be left empty even if build_mode is "manual"; update the guidance so that build_command is only required when build_provider is "none" or when build_mode is "manual" and build_provider is not "sandbox-recipe" (i.e., for manual, non-recipe builds supply build_command), and state that codeql-plan.yml should never include a concrete build_command for units using sandbox-recipe (use sandbox_build_target instead).

coderabbitai · 2026-06-06T21:22:45Z

+If `itemdb/codeql/last-run-manifest.yml` exists, read its `health` block.
+When `health.usable` is `true`, use `itemdb/codeql/normalized/alerts.yml`
+and `file-signals.yml` as enrichment for the target file. When
+`health.usable` is `false`, the CodeQL run did not produce trustworthy


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

CodeQL signal paths are outdated for per-run layout.

After this refactor, normalized outputs are per-run (itemdb/codeql/runs/<run_id>/normalized/...), not top-level itemdb/codeql/normalized/.... This prompt can point sweep to missing/stale files.

Suggested fix

-If `itemdb/codeql/last-run-manifest.yml` exists, read its `health` block. -When `health.usable` is `true`, use `itemdb/codeql/normalized/alerts.yml` -and `file-signals.yml` as enrichment for the target file. When +If `itemdb/codeql/last-run-manifest.yml` exists, read its `health` block and `run_id`. +When `health.usable` is `true`, read: +- `itemdb/codeql/runs/<run_id>/normalized/alerts.yml` +- `itemdb/codeql/runs/<run_id>/normalized/file-signals.yml` +as enrichment for the target file. When `health.usable` is `false`, the CodeQL run did not produce trustworthy output — do not import its signals.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@prompts/sweep.md` around lines 24 - 27, Update the prompt to use per-run CodeQL paths: read the run id from itemdb/codeql/last-run-manifest.yml (as before) and then reference enrichment files under itemdb/codeql/runs/<run_id>/normalized/alerts.yml and itemdb/codeql/runs/<run_id>/normalized/file-signals.yml instead of the old top-level itemdb/codeql/normalized/... paths; ensure any mention of "normalized/alerts.yml" or "file-signals.yml" in this prompt is replaced with the per-run path pattern using the manifest's run_id.

coderabbitai · 2026-06-06T21:22:45Z

 def test_phase_1c_reads_threat_model() -> None:
-    content = _read_prompt("phase-1c-sandbox.md")
+    content = _read_prompt("phase-1b-sandbox.md")
    assert "threat-model.md" in content


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test target file mismatches the test intent.

test_phase_1c_reads_threat_model currently reads phase-1b-sandbox.md, so it no longer verifies the Phase 1c recon prompt requirement it is named for.

✅ Proposed fix

def test_phase_1c_reads_threat_model() -> None: - content = _read_prompt("phase-1b-sandbox.md") + content = _read_prompt("phase-1c-recon.md") assert "threat-model.md" in content

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_phase_1_prompts_threat_model.py` around lines 64 - 66, The test test_phase_1c_reads_threat_model is reading the wrong fixture ("phase-1b-sandbox.md"); change the call to _read_prompt to open the Phase 1c recon prompt (e.g., "phase-1c-recon.md") so the test actually verifies the Phase 1c recon requirement, leaving the assert ("threat-model.md" in content) as-is; update the string literal in the test function test_phase_1c_reads_threat_model to the correct prompt filename.

coderabbitai · 2026-06-06T21:22:45Z

+    if install_strategy not in ("mount-host-bundle",):
+        return True, ""
+
+    host_plat = host_platform()
+    container_plat = container_platform(service, compose_file)
+
+    if not platforms_compatible(host_plat, container_plat):
+        return False, (
+            f"CodeQL bundle is for {host_plat}; sandbox service "
+            f"{service!r} runs {container_plat}. "
+            "install_strategy=mount-host-bundle cannot cross platforms. "
+            "Use install_strategy=download-in-container or image-preinstalled "
+            "(not yet supported)."


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

copy-host-bundle bypasses platform compatibility checks.

Line 25 only guards mount-host-bundle, but copy-host-bundle also carries a host-built CodeQL bundle and can fail the same way on cross-platform host/container pairs. This should be validated identically.

🔧 Proposed fix

- if install_strategy not in ("mount-host-bundle",): + if install_strategy not in ("mount-host-bundle", "copy-host-bundle"): return True, "" @@ - "install_strategy=mount-host-bundle cannot cross platforms. " - "Use install_strategy=download-in-container or image-preinstalled " - "(not yet supported)." + f"install_strategy={install_strategy} cannot cross platforms. " + "Use install_strategy=image-preinstalled."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/codeql/in_docker.py` around lines 25 - 37, The guard only treats "mount-host-bundle" as a host-bundle case but misses "copy-host-bundle", so update the conditional that checks install_strategy to include "copy-host-bundle" (e.g., install_strategy not in ("mount-host-bundle", "copy-host-bundle")) and ensure the subsequent validation uses host_platform(), container_platform(service, compose_file), and platforms_compatible() to reject cross-platform combinations with an error message that mentions both mount-host-bundle and copy-host-bundle as unsupported for cross-platform installs.

coderabbitai · 2026-06-06T21:22:45Z

+def _generate_run_id() -> str:
+    ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+    fingerprint = hashlib.sha256(ts.encode()).hexdigest()[:8]
+    return f"{ts}-{fingerprint}"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run ID generation can collide for runs started in the same second.

Line 18 hashes only ts, so it’s deterministic per second. Two starts within one second produce identical run_id, breaking per-run isolation.

Suggested fix

import hashlib +import uuid @@ def _generate_run_id() -> str: ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ") - fingerprint = hashlib.sha256(ts.encode()).hexdigest()[:8] + fingerprint = uuid.uuid4().hex[:8] return f"{ts}-{fingerprint}"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/codeql/pipeline.py` around lines 16 - 19, The _generate_run_id function currently derives fingerprint solely from ts, causing identical run IDs for starts within the same second; update _generate_run_id to add non-deterministic entropy (e.g., include microseconds, process id, or a random value such as uuid.uuid4().hex or secrets.token_hex) into the hashed input or as part of the fingerprint so each invocation is unique; keep the return format f"{ts}-{fingerprint}" and update the fingerprint calculation in _generate_run_id to hash timestamp plus the chosen random/entropy source (or replace hashing with a truncated uuid4) to avoid collisions.

coderabbitai · 2026-06-06T21:22:45Z

+        result = subprocess.run(
+            ["uname", "-sm"], capture_output=True, text=True, timeout=10
+        )
+        return result.stdout.strip()
+    except Exception:
+        return "unknown"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle non-zero probe exits as unknown instead of returning arbitrary stderr text.

Both probes only fall back on exceptions, but subprocess.run(...) with default check=False does not raise on command failure. A failed probe can therefore return empty/diagnostic text and be treated as a real platform string, causing false incompatibility in downstream gating.

Suggested fix

def host_platform() -> str: @@ - result = subprocess.run( + result = subprocess.run( ["uname", "-sm"], capture_output=True, text=True, timeout=10 ) - return result.stdout.strip() + if result.returncode != 0: + return "unknown" + value = result.stdout.strip() + return value or "unknown" except Exception: return "unknown" @@ - result = subprocess.run( + result = subprocess.run( [ "docker", "compose", "-f", str(compose_file), "exec", "-T", service, "uname", "-sm", ], capture_output=True, text=True, timeout=timeout, ) - return result.stdout.strip() or result.stderr.strip() + if result.returncode != 0: + return "unknown" + value = result.stdout.strip() + return value or "unknown" except Exception: return "unknown"

Also applies to: 31-40

🧰 Tools

🪛 Ruff (0.15.15)

[error] 16-16: Starting a process with a partial executable path

(S607)

[warning] 19-19: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/codeql/platform.py` around lines 15 - 20, The probe currently returns result.stdout.strip() even when the command fails; change the subprocess.run invocation handling in the probe (the call that uses ["uname", "-sm"] and the similar probe at lines 31-40) to treat non-zero exit codes as failures: after subprocess.run(...) check result.returncode and if it's non-zero (or result.stdout is empty/unreliable) return "unknown" instead of returning stdout/stderr; update the code paths around the subprocess.run call and the existing return result.stdout.strip() so both probes consistently return "unknown" on non-zero exits or empty output.

coderabbitai · 2026-06-06T21:22:45Z

+def cmd_recipe_validate(args: argparse.Namespace) -> int:
+    path = Path(args.path) if hasattr(args, "path") and args.path else SANDBOX_RECIPE_PATH
+    if not path.is_file():
+        print(C.fail(f"Sandbox recipe not found at {path}"), file=sys.stderr)
+        return 1
+
+    try:
+        from sandbox.recipe import load_recipe, validate_recipe
+        recipe = load_recipe(path)
+    except Exception as exc:
+        print(C.fail(f"Failed to load recipe: {exc}"), file=sys.stderr)
+        return 1
+
+    errors = validate_recipe(recipe, root=str(ROOT))
+    if errors:
+        print(C.fail(f"Sandbox recipe at {path} has {len(errors)} validation error(s):"), file=sys.stderr)
+        for err in errors:
+            print(f"  {C.SYM_BULLET} {err}")
+        return 1
+
+    print(C.ok(f"Sandbox recipe at {path} is valid."))
+    return 0
+
+
+def cmd_recipe_print(args: argparse.Namespace) -> int:
+    path = Path(args.path) if hasattr(args, "path") and args.path else SANDBOX_RECIPE_PATH
+    if not path.is_file():
+        print(C.fail(f"Sandbox recipe not found at {path}"), file=sys.stderr)
+        return 1
+
+    try:
+        from sandbox.recipe import load_recipe
+        recipe = load_recipe(path)
+    except Exception as exc:
+        print(C.fail(f"Failed to load recipe: {exc}"), file=sys.stderr)
+        return 1
+
+    if args.format == "json":
+        _emit(recipe, "json")
+    else:
+        from sandbox.recipe import dump_recipe
+        print(dump_recipe(recipe).rstrip())
+
+    return 0
+


🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Keep tools/sandbox-bootstrap.py as delegation-only; move recipe command logic into a package module.

These new command bodies add implementation in a root tools/*.py script. Per repo policy, this file should stay a thin wrapper and delegate to implementation under a package (e.g., tools/sandbox/...).

As per coding guidelines: “Standalone scripts at the tools/ root … should be thin wrappers that delegate to their respective packages. Implementation must live in the package, not the script.”

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 1571-1571: Do not catch blind exception: Exception

(BLE001)

[warning] 1595-1595: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/sandbox-bootstrap.py` around lines 1562 - 1606, The cmd_recipe_validate and cmd_recipe_print functions implement business logic in the top-level tools/sandbox-bootstrap.py; per policy this file must be a thin delegating wrapper—move the implementation into a package module (e.g., create tools.sandbox.recipe_commands or similar) and keep these functions as one-line delegators that import and call the package functions (e.g., recipe_commands.validate_recipe_cmd and recipe_commands.print_recipe_cmd) passing through args; ensure all imports and heavy logic (load_recipe, validate_recipe, dump_recipe, _emit, error handling and printing) live in the new package module and update the wrapper functions cmd_recipe_validate and cmd_recipe_print to simply import and return the package function results.

Source: Coding guidelines

coderabbitai · 2026-06-06T21:22:45Z

+def _validate_codeql_hints(codeql: dict[str, Any], prefix: str) -> list[str]:
+    errors: list[str] = []
+
+    install_strategy = codeql.get("install_strategy")
+    if install_strategy is not None:
+        if not isinstance(install_strategy, str) or install_strategy not in VALID_INSTALL_STRATEGIES:
+            valid = ", ".join(sorted(VALID_INSTALL_STRATEGIES))
+            errors.append(
+                f"sandbox-recipe.yml: {prefix}.install_strategy {install_strategy!r} invalid (allowed: {valid})"
+            )
+
+    preferred_mode = codeql.get("preferred_execution_mode")
+    if preferred_mode is not None:
+        if not isinstance(preferred_mode, str) or preferred_mode not in VALID_EXECUTION_MODES:
+            valid = ", ".join(sorted(VALID_EXECUTION_MODES))
+            errors.append(
+                f"sandbox-recipe.yml: {prefix}.preferred_execution_mode {preferred_mode!r} invalid (allowed: {valid})"
+            )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Top-level codeql.default_execution_mode is currently unvalidated.

validate_recipe() passes top-level codeql into _validate_codeql_hints(), but this helper only validates preferred_execution_mode. That leaves invalid codeql.default_execution_mode values undetected even though the example schema uses that key.

🔧 Proposed fix

def _validate_codeql_hints(codeql: dict[str, Any], prefix: str) -> list[str]: errors: list[str] = [] @@ - preferred_mode = codeql.get("preferred_execution_mode") - if preferred_mode is not None: - if not isinstance(preferred_mode, str) or preferred_mode not in VALID_EXECUTION_MODES: + # Build-target hints use preferred_execution_mode; top-level codeql uses default_execution_mode. + mode_key = "preferred_execution_mode" if "preferred_execution_mode" in codeql else "default_execution_mode" + mode_value = codeql.get(mode_key) + if mode_value is not None: + if not isinstance(mode_value, str) or mode_value not in VALID_EXECUTION_MODES: valid = ", ".join(sorted(VALID_EXECUTION_MODES)) errors.append( - f"sandbox-recipe.yml: {prefix}.preferred_execution_mode {preferred_mode!r} invalid (allowed: {valid})" + f"sandbox-recipe.yml: {prefix}.{mode_key} {mode_value!r} invalid (allowed: {valid})" )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/sandbox/recipe.py` around lines 175 - 192, The helper _validate_codeql_hints currently skips validating the top-level codeql key default_execution_mode; add a validation block (similar to the preferred_mode check) that reads codeql.get("default_execution_mode"), ensures it's a string and one of VALID_EXECUTION_MODES, and appends an error with the same format (f"sandbox-recipe.yml: {prefix}.default_execution_mode {value!r} invalid (allowed: {valid})") when invalid so validate_recipe() will catch bad default_execution_mode values.

pruiz closed this Jun 5, 2026

pruiz force-pushed the wip/codeql-sandbox-recipe-refactor branch from 17d5d25 to f140f4b Compare June 5, 2026 21:01

pruiz reopened this Jun 5, 2026

pruiz added 9 commits June 6, 2026 16:27

docs: plan CodeQL sandbox recipe refactor

c145c88

chore: delete obsolete repair files, remove skipped tests

7058e5c

Delete prompts/phase-1-codeql-repair.md and tests/test_phase_1_codeql_plan_repair.py. Remove three skipped repair-related tests from test_codecome_check_codeql.py. Full test suite: 730 passed, 0 failures.

pruiz force-pushed the wip/codeql-sandbox-recipe-refactor branch from 9445216 to 7058e5c Compare June 6, 2026 20:54

pruiz requested a review from Copilot June 6, 2026 20:55

Copilot started reviewing on behalf of pruiz June 6, 2026 20:55 View session

Copilot AI reviewed Jun 6, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

fix(codeql): allow recipe-backed manual builds

4de38ff

		Branch purpose: define the target architecture and implementation plan before changing runtime code.

		## 1. Problem statement

Conversation

pruiz commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Review focus

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pruiz commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading