Skip to content

refactor(test/e2e): consolidate shell-probe with scenario spawn infra#5033

Open
laitingsheng wants to merge 3 commits into
mainfrom
refactor/e2e-shell-spawn-consolidation
Open

refactor(test/e2e): consolidate shell-probe with scenario spawn infra#5033
laitingsheng wants to merge 3 commits into
mainfrom
refactor/e2e-shell-spawn-consolidation

Conversation

@laitingsheng

@laitingsheng laitingsheng commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Consolidate the TypeScript spawn boundaries shared between the Vitest fixture layer and the scenario orchestrator into a single framework module. Every TS spawn site now reaches the same detached process-group cleanup, SIGTERM -> SIGKILL escalation, and NUL-byte argv guard, and the validation_suites/sandbox-exec.sh transport stays centralised for in-sandbox commands.

Related Issue

Resolves #4988.

Changes

  • New shared modules under test/e2e-scenario/framework/shell/: a lifecycle-only superviseChild (timeout, abort, SIGTERM -> SIGKILL on a detached process group, chunk callbacks) and trustedShellCommand / validateShellToken (NUL-byte argv guard, brand symbol).
  • framework/shell-probe.ts, scenarios/orchestrators/phase.ts (runAction + runShellStep), and the probe helpers (probes/util.ts, probes/docs-validation.ts, probes/diagnostics.ts) now spawn at their own site (preserving each site's argv contract and CodeQL marker) and hand the child to the shared supervisor.
  • Probe helpers (spawnBash, runHostCmd, docs / diagnostics probes) gain detached process-group cleanup and SIGKILL escalation; runHostCmd gains the NUL-byte argv guard.
  • Adds framework-tests/e2e-shell-supervisor.test.ts to cover the headline guarantees at the leaf level; existing e2e-fixture-context.test.ts and e2e-phase-orchestrators.test.ts continue to cover end-to-end behaviour.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • `npx prek run --all-files` passes
  • `npm test` passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • `npm run docs` builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • New Features

    • Centralized shell supervision for e2e flows: robust timeouts, escalation, cancellation, and streamed stdout/stderr.
    • Shared trusted-command validation to reject unsafe argv tokens (e.g., NUL bytes) and require a reason.
  • Tests

    • Added and updated e2e tests covering supervision, timeout/escalation, spawn failures, stdin/stdout forwarding, and validation.
  • Bug Fixes

    • More consistent timeout/spawn-error classification and improved redaction in failure reporting.

…wn infra

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng added the area: e2e End-to-end tests, nightly failures, or validation infrastructure label Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 281105e9-4518-4598-ab63-c6422d096515

📥 Commits

Reviewing files that changed from the base of the PR and between a30ddf3 and c213718.

📒 Files selected for processing (1)
  • test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts

📝 Walkthrough

Walkthrough

Extracts trusted-shell token validation and a superviseChild helper; migrates shell-probe, phase orchestrator, probe utilities, and probes to validate argv tokens and use the shared supervisor for detached process-group management, timeout escalation, abort handling, and stdout/stderr streaming.

Changes

Unified Shell Infrastructure Consolidation

Layer / File(s) Summary
Trusted shell command brand and validation
test/e2e-scenario/framework/shell/trusted-command.ts, test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts, test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts
Branded TrustedShellCommand and TrustedShellCommandInput, validateShellToken rejects NUL bytes, and trustedShellCommand factory enforces trimmed/non-empty command and reason; tests cover NUL rejection, clean tokens, required reason, and validate-hook propagation.
Child process supervision lifecycle
test/e2e-scenario/framework/shell/supervisor.ts, test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts
superviseChild implements SuperviseOptions/SuperviseResult: wall-clock timeout, SIGTERM→SIGKILL escalation after grace, AbortSignal-aware cancellation (keeps timedOut false), stdin forwarding, stdout/stderr chunk callbacks, and spawnError reporting; tests validate exit codes, stream capture, timeout escalation timing, abort behavior, spawn errors, and stdin piping.
Shell probe migration to shared infrastructure
test/e2e-scenario/framework/shell-probe.ts
Re-exports trusted-command types and factory; replaces manual process-group tracking, stdout/stderr listeners, and local timers with a single superviseChild call and uses supervised outcomes for artifact/error construction.
Phase orchestrator execution via supervisor
test/e2e-scenario/scenarios/orchestrators/phase.ts
runAction and runShellStep validate all bash/script tokens via validateShellToken, spawn validated argv, delegate lifecycle and stdio to superviseChild, redact explicit secret values verbatim, await log flush, and map superviseChild outcomes to action/step results.
Probe utility helpers refactored
test/e2e-scenario/scenarios/probes/util.ts
spawnBash and runHostCmd now validate argv with validateShellToken, spawn detached children, delegate timeout/lifecycle/stdio to superviseChild, add timedOut to CmdResult, and remove the local NUL helper.
Diagnostics probe integration
test/e2e-scenario/scenarios/probes/diagnostics.ts
Validates nemoclaw argv, uses superviseChild for supervision and stderr capture, extends evidence with timedOut and spawnError, and returns early on spawn errors/timeouts.
Docs validation probe integration
test/e2e-scenario/scenarios/probes/docs-validation.ts
runCheck now validates script/args with validateShellToken, uses superviseChild, extends results with timedOut/spawnError, and classifies timeouts/spawn errors using the new flags.
Probe timeout classifier updates
test/e2e-scenario/scenarios/probes/*
Multiple probes updated to classify transient failures using timedOut instead of checking signal === "SIGTERM".

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4965: Fixture tests validate the same trustedShellCommand NUL-byte/reason guardrails and superviseChild lifecycle behaviors that this PR extracts into shared framework modules.

Suggested labels

v0.0.62

Suggested reviewers

  • jyaunches
  • cv

Poem

🐇 I dug a tunnel, found a shell,

Trimmed its args and guarded well,
SIGTERM tolled and SIGKILL came after,
Streams fed back, and secrets went softer,
Hopping home with logs all swell.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: consolidating shell-probe infrastructure with scenario spawn infrastructure into shared framework modules.
Linked Issues check ✅ Passed The PR fully implements all acceptance criteria from #4988: extracts shared shell/spawn infrastructure (superviseChild, trustedShellCommand, validateShellToken) into framework/shell modules [#4988], preserves sandbox-exec transport centralization [#4988], maintains detached process-group lifecycle handling with timeout/abort/SIGTERM→SIGKILL [#4988], keeps trusted command contract with NUL-byte rejection and branded descriptors [#4988], and preserves single artifact/evidence path with canonical redaction [#4988].
Out of Scope Changes check ✅ Passed All changes are directly scoped to the objectives: new shell framework modules (supervisor.ts, trusted-command.ts), refactored spawn sites (shell-probe.ts, phase.ts, util.ts, probe helpers), and updated tests. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/e2e-shell-spawn-consolidation

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread test/e2e-scenario/framework/shell-probe.ts Fixed
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: e2e-scenarios:ubuntu-repo-cloud-openclaw, e2e-vitest-scenarios:ubuntu-repo-cli-smoke
Optional E2E: e2e-scenarios:ubuntu-repo-cloud-openclaw-custom-policies, e2e-scenarios:ubuntu-rebuild-openclaw

Dispatch hint: scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • e2e-scenarios:ubuntu-repo-cloud-openclaw (high; live scenario with Docker/OpenShell sandbox and NVIDIA_API_KEY): This canonical typed scenario exercises install/onboarding, sandbox lifecycle, smoke, inference, credentials, and supplemental security/probe suites, including security-shields, security-policy, security-injection, diagnostics, docs-validation, sandbox lifecycle, and sandbox operations that are directly affected by the shared supervisor and trusted-command refactor.
  • e2e-vitest-scenarios:ubuntu-repo-cli-smoke (low; builds CLI and runs repo-local CLI smoke): Covers the live Vitest fixture path that uses ShellProbe/host.command after the refactor to shared supervision and trusted command validation, without requiring a full sandbox scenario.

Optional E2E

  • e2e-scenarios:ubuntu-repo-cloud-openclaw-custom-policies (high; live scenario with Docker/OpenShell sandbox and NVIDIA_API_KEY): Useful broader confidence for policy/config-heavy runtime assertions and supplemental sandbox lifecycle behavior, but largely overlaps the affected framework/probe paths covered by ubuntu-repo-cloud-openclaw.
  • e2e-scenarios:ubuntu-rebuild-openclaw (high; live rebuild scenario with Docker/OpenShell sandbox and NVIDIA_API_KEY): Optional confidence for lifecycle phase actions after runAction switched to the shared supervisor and explicit secret-value redaction, especially timeout/log-flush behavior around long-running lifecycle work.

New E2E recommendations

  • Process supervision in live sandbox commands (medium): The new unit tests cover SIGTERM-ignoring children, but there is no dedicated live scenario assertion that intentionally times out a sandbox command with a stubborn child and verifies the scenario runner continues without orphaned processes.
    • Suggested test: Add a sandbox lifecycle probe that launches a SIGTERM-ignoring in-sandbox child under a short timeout, asserts timedOut=true classification, and verifies cleanup/no orphan before continuing the scenario.

Dispatch hint

  • Workflow: .github/workflows/e2e-scenarios.yaml
  • jobs input: scenarios=ubuntu-repo-cloud-openclaw

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • e2e-scenarios-all: Changes touch shared scenario framework/runner infrastructure: shell supervision, trusted command validation, shell probe plumbing, phase orchestrator spawning/redaction, and shared probe utilities used by multiple scenario suites. Per policy, runner/runtime changes require the full scenario fan-out.
    • Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

  • None.

Relevant changed files

  • test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts
  • test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts
  • test/e2e-scenario/framework/shell-probe.ts
  • test/e2e-scenario/framework/shell/supervisor.ts
  • test/e2e-scenario/framework/shell/trusted-command.ts
  • test/e2e-scenario/scenarios/orchestrators/phase.ts
  • test/e2e-scenario/scenarios/probes/diagnostics.ts
  • test/e2e-scenario/scenarios/probes/docs-validation.ts
  • test/e2e-scenario/scenarios/probes/injection-blocked.ts
  • test/e2e-scenario/scenarios/probes/network-policy.ts
  • test/e2e-scenario/scenarios/probes/shields-config.ts
  • test/e2e-scenario/scenarios/probes/util.ts

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 0 still apply, 0 new items found

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e-scenario/scenarios/orchestrators/phase.ts (1)

174-181: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass declared secret values into redactString.

Line 203/206 and Line 453/456 only run the regex/context redactors. Since these children still receive action.secretEnv / step.secretEnv, a probe that echoes one of those values verbatim can now leak it into the evidence log and stderrTail unchanged.

🔧 Minimal fix
+    const explicitRedactions = (action.secretEnv ?? [])
+      .map((key) => env[key])
+      .filter((value): value is string => typeof value === "string" && value.length > 0);
+
     const supervised = await superviseChild(child, {
       timeoutMs: timeoutSeconds * 1_000,
       onStdout: (chunk) => {
-        logStream.write(redactString(chunk));
+        logStream.write(redactString(chunk, explicitRedactions));
       },
       onStderr: (chunk) => {
-        const redacted = redactString(chunk);
+        const redacted = redactString(chunk, explicitRedactions);
         logStream.write(redacted);
         stderrTail = (stderrTail + redacted).slice(-4096);
       },
     });

Mirror the same change in runShellStep using step.secretEnv.

Also applies to: 202-208, 390-397, 452-458

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/scenarios/orchestrators/phase.ts` around lines 174 - 181,
The regex/context-only redaction calls need to also consider declared secret
values so verbatim secrets from child processes aren't leaked; update
runShellStep to pass step.secretEnv into redactString (just like the change
already applied for action.secretEnv when building env) wherever you produce
evidence logs or stderrTail using redactString, ensuring both action.secretEnv
and step.secretEnv are forwarded to redactString in the code paths that create
evidence/log entries; search for runShellStep, redactString, step.secretEnv and
update the redaction calls to include the secret list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/framework/shell/supervisor.ts`:
- Around line 101-114: The abort path can be misreported as a timeout because
the wall-clock timeout remains active after opts.signal triggers; update the
abort handling so that onAbort clears the timeout (clearTimeout(timeout)) before
calling terminate and ensure that terminate (or the surrounding logic that sets
timedOut) is not setting timedOut when termination was caused by opts.signal;
specifically modify the onAbort handler and/or terminate to clear the timer and
propagate an explicit abort reason (or a boolean) so timedOut remains false for
external cancels (references: timeout, timedOut, terminate, onAbort,
opts.signal).

In `@test/e2e-scenario/scenarios/orchestrators/phase.ts`:
- Around line 183-184: validateShellToken can throw on NUL bytes and those
exceptions currently escape runAction / runShellStep (seen where bashArgs are
mapped and where runShellStep is called), aborting the entire phase; wrap the
validateShellToken call(s) inside runShellStep/runAction so any thrown error is
caught and converted into a structured failure object with type "runner-infra"
(include the original error message in the failure details) and return that
result instead of letting the exception bubble to run()/runStep(); update both
call sites (the bashArgs mapping and the other occurrence around runShellStep)
to follow this pattern and ensure callers handle the structured failure return.

In `@test/e2e-scenario/scenarios/probes/diagnostics.ts`:
- Around line 89-95: When supervised.spawnError is present, don't collapse it
into a normal child exit path; instead record the spawn failure separately (e.g.
set a boolean like infraFailure/runnerInfra or a distinct sentinel variable)
while still appending the message to stderrTail, and leave exitCode
unset/nullable (or set to a distinct sentinel) so later logic that inspects
exitCode !== 0 can still differentiate a real child exit from a runner-infra
spawn failure; update the code around supervised.spawnError, exitCode and any
later classification logic that checks exitCode to use the new
infraFailure/runnerInfra signal when classifying failures.

In `@test/e2e-scenario/scenarios/probes/docs-validation.ts`:
- Around line 78-88: The runCheck() path in docs-validation.ts currently masks
supervised.spawnError by converting it into exitCode: 127; update runCheck() and
the DocsCheckResult shape so it returns the original spawn error instead of an
artificial exit code: add/return a spawnError (or error) field on
DocsCheckResult when supervised.spawnError is present (preserve
stderrTail/stdoutTail and elapsedMs), and ensure any callers of
runCheck()/DocsCheckResult (the docs check runner/runner-infra classifier)
inspect this new spawnError field rather than treating exitCode 127 as a normal
script failure; update type/interface for DocsCheckResult and usages that
construct/consume it accordingly (look for runCheck and DocsCheckResult in
docs-validation.ts and consumer code).

In `@test/e2e-scenario/scenarios/probes/util.ts`:
- Around line 126-141: The returned command result currently omits the
supervised.timedOut flag, so update both return paths (the spawnError branch and
the normal branch around supervised.spawnError, supervised.exitCode,
supervised.signal) to include timedOut: supervised.timedOut (or false if absent)
so callers can detect timeouts reliably; search for superviseChild /
SuperviseResult and ensure the returned object mirrors timedOut from the
supervise result (and then update downstream callers to branch on
result.timedOut instead of inferring from signal).

---

Outside diff comments:
In `@test/e2e-scenario/scenarios/orchestrators/phase.ts`:
- Around line 174-181: The regex/context-only redaction calls need to also
consider declared secret values so verbatim secrets from child processes aren't
leaked; update runShellStep to pass step.secretEnv into redactString (just like
the change already applied for action.secretEnv when building env) wherever you
produce evidence logs or stderrTail using redactString, ensuring both
action.secretEnv and step.secretEnv are forwarded to redactString in the code
paths that create evidence/log entries; search for runShellStep, redactString,
step.secretEnv and update the redaction calls to include the secret list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 73dd647b-2149-4482-834c-320b399ff340

📥 Commits

Reviewing files that changed from the base of the PR and between 38c03b1 and da50821.

📒 Files selected for processing (8)
  • test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts
  • test/e2e-scenario/framework/shell-probe.ts
  • test/e2e-scenario/framework/shell/supervisor.ts
  • test/e2e-scenario/framework/shell/trusted-command.ts
  • test/e2e-scenario/scenarios/orchestrators/phase.ts
  • test/e2e-scenario/scenarios/probes/diagnostics.ts
  • test/e2e-scenario/scenarios/probes/docs-validation.ts
  • test/e2e-scenario/scenarios/probes/util.ts

Comment thread test/e2e-scenario/framework/shell/supervisor.ts
Comment thread test/e2e-scenario/scenarios/orchestrators/phase.ts Outdated
Comment thread test/e2e-scenario/scenarios/probes/diagnostics.ts Outdated
Comment thread test/e2e-scenario/scenarios/probes/docs-validation.ts
Comment thread test/e2e-scenario/scenarios/probes/util.ts
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng added the refactor PR restructures code without intended behavior change label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure refactor PR restructures code without intended behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consolidate E2E fixture shell probe with scenario spawn infra

2 participants