refactor(test/e2e): consolidate shell-probe with scenario spawn infra by laitingsheng · Pull Request #5033 · NVIDIA/NemoClaw

laitingsheng · 2026-06-09T07:15:12Z

Summary

Consolidate the TypeScript spawn boundaries shared between the Vitest fixture layer and the scenario orchestrator into a single framework module. Every TS spawn site now reaches the same detached process-group cleanup, SIGTERM -> SIGKILL escalation, and NUL-byte argv guard, and the validation_suites/sandbox-exec.sh transport stays centralised for in-sandbox commands.

Related Issue

Resolves #4988.

Changes

New shared modules under test/e2e-scenario/framework/shell/: a lifecycle-only superviseChild (timeout, abort, SIGTERM -> SIGKILL on a detached process group, chunk callbacks) and trustedShellCommand / validateShellToken (NUL-byte argv guard, brand symbol).
framework/shell-probe.ts, scenarios/orchestrators/phase.ts (runAction + runShellStep), and the probe helpers (probes/util.ts, probes/docs-validation.ts, probes/diagnostics.ts) now spawn at their own site (preserving each site's argv contract and CodeQL marker) and hand the child to the shared supervisor.
Probe helpers (spawnBash, runHostCmd, docs / diagnostics probes) gain detached process-group cleanup and SIGKILL escalation; runHostCmd gains the NUL-byte argv guard.
Adds framework-tests/e2e-shell-supervisor.test.ts to cover the headline guarantees at the leaf level; existing e2e-fixture-context.test.ts and e2e-phase-orchestrators.test.ts continue to cover end-to-end behaviour.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

`npx prek run --all-files` passes
`npm test` passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
`npm run docs` builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

New Features
- Centralized shell supervision for e2e flows: robust timeouts, escalation, cancellation, and streamed stdout/stderr.
- Shared trusted-command validation to reject unsafe argv tokens (e.g., NUL bytes) and require a reason.
Tests
- Added and updated e2e tests covering supervision, timeout/escalation, spawn failures, stdin/stdout forwarding, and validation.
Bug Fixes
- More consistent timeout/spawn-error classification and improved redaction in failure reporting.

…wn infra Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai · 2026-06-09T07:15:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 281105e9-4518-4598-ab63-c6422d096515

📥 Commits

Reviewing files that changed from the base of the PR and between a30ddf3 and c213718.

📒 Files selected for processing (1)

test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts

📝 Walkthrough

Walkthrough

Extracts trusted-shell token validation and a superviseChild helper; migrates shell-probe, phase orchestrator, probe utilities, and probes to validate argv tokens and use the shared supervisor for detached process-group management, timeout escalation, abort handling, and stdout/stderr streaming.

Changes

Unified Shell Infrastructure Consolidation

Layer / File(s)	Summary
Trusted shell command brand and validation `test/e2e-scenario/framework/shell/trusted-command.ts`, `test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts`, `test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts`	Branded `TrustedShellCommand` and `TrustedShellCommandInput`, `validateShellToken` rejects NUL bytes, and `trustedShellCommand` factory enforces trimmed/non-empty command and reason; tests cover NUL rejection, clean tokens, required reason, and validate-hook propagation.
Child process supervision lifecycle `test/e2e-scenario/framework/shell/supervisor.ts`, `test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts`	`superviseChild` implements SuperviseOptions/SuperviseResult: wall-clock timeout, SIGTERM→SIGKILL escalation after grace, AbortSignal-aware cancellation (keeps timedOut false), stdin forwarding, stdout/stderr chunk callbacks, and spawnError reporting; tests validate exit codes, stream capture, timeout escalation timing, abort behavior, spawn errors, and stdin piping.
Shell probe migration to shared infrastructure `test/e2e-scenario/framework/shell-probe.ts`	Re-exports trusted-command types and factory; replaces manual process-group tracking, stdout/stderr listeners, and local timers with a single `superviseChild` call and uses supervised outcomes for artifact/error construction.
Phase orchestrator execution via supervisor `test/e2e-scenario/scenarios/orchestrators/phase.ts`	`runAction` and `runShellStep` validate all bash/script tokens via `validateShellToken`, spawn validated argv, delegate lifecycle and stdio to `superviseChild`, redact explicit secret values verbatim, await log flush, and map superviseChild outcomes to action/step results.
Probe utility helpers refactored `test/e2e-scenario/scenarios/probes/util.ts`	`spawnBash` and `runHostCmd` now validate argv with `validateShellToken`, spawn detached children, delegate timeout/lifecycle/stdio to `superviseChild`, add `timedOut` to `CmdResult`, and remove the local NUL helper.
Diagnostics probe integration `test/e2e-scenario/scenarios/probes/diagnostics.ts`	Validates `nemoclaw` argv, uses `superviseChild` for supervision and stderr capture, extends evidence with `timedOut` and `spawnError`, and returns early on spawn errors/timeouts.
Docs validation probe integration `test/e2e-scenario/scenarios/probes/docs-validation.ts`	`runCheck` now validates script/args with `validateShellToken`, uses `superviseChild`, extends results with `timedOut`/`spawnError`, and classifies timeouts/spawn errors using the new flags.
Probe timeout classifier updates `test/e2e-scenario/scenarios/probes/*`	Multiple probes updated to classify transient failures using `timedOut` instead of checking `signal === "SIGTERM"`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/NemoClaw#4965: Fixture tests validate the same trustedShellCommand NUL-byte/reason guardrails and superviseChild lifecycle behaviors that this PR extracts into shared framework modules.

Suggested labels

v0.0.62

Suggested reviewers

jyaunches
cv

Poem

🐇 I dug a tunnel, found a shell,

Trimmed its args and guarded well,
SIGTERM tolled and SIGKILL came after,
Streams fed back, and secrets went softer,
Hopping home with logs all swell.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: consolidating shell-probe infrastructure with scenario spawn infrastructure into shared framework modules.
Linked Issues check	✅ Passed	The PR fully implements all acceptance criteria from `#4988`: extracts shared shell/spawn infrastructure (superviseChild, trustedShellCommand, validateShellToken) into framework/shell modules [`#4988`], preserves sandbox-exec transport centralization [`#4988`], maintains detached process-group lifecycle handling with timeout/abort/SIGTERM→SIGKILL [`#4988`], keeps trusted command contract with NUL-byte rejection and branded descriptors [`#4988`], and preserves single artifact/evidence path with canonical redaction [`#4988`].
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the objectives: new shell framework modules (supervisor.ts, trusted-command.ts), refactored spawn sites (shell-probe.ts, phase.ts, util.ts, probe helpers), and updated tests. No unrelated changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/e2e-shell-spawn-consolidation

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-09T07:19:21Z

E2E Advisor Recommendation

Required E2E: e2e-scenarios:ubuntu-repo-cloud-openclaw, e2e-vitest-scenarios:ubuntu-repo-cli-smoke
Optional E2E: e2e-scenarios:ubuntu-repo-cloud-openclaw-custom-policies, e2e-scenarios:ubuntu-rebuild-openclaw

Dispatch hint: scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

e2e-scenarios:ubuntu-repo-cloud-openclaw (high; live scenario with Docker/OpenShell sandbox and NVIDIA_API_KEY): This canonical typed scenario exercises install/onboarding, sandbox lifecycle, smoke, inference, credentials, and supplemental security/probe suites, including security-shields, security-policy, security-injection, diagnostics, docs-validation, sandbox lifecycle, and sandbox operations that are directly affected by the shared supervisor and trusted-command refactor.
e2e-vitest-scenarios:ubuntu-repo-cli-smoke (low; builds CLI and runs repo-local CLI smoke): Covers the live Vitest fixture path that uses ShellProbe/host.command after the refactor to shared supervision and trusted command validation, without requiring a full sandbox scenario.

Optional E2E

e2e-scenarios:ubuntu-repo-cloud-openclaw-custom-policies (high; live scenario with Docker/OpenShell sandbox and NVIDIA_API_KEY): Useful broader confidence for policy/config-heavy runtime assertions and supplemental sandbox lifecycle behavior, but largely overlaps the affected framework/probe paths covered by ubuntu-repo-cloud-openclaw.
e2e-scenarios:ubuntu-rebuild-openclaw (high; live rebuild scenario with Docker/OpenShell sandbox and NVIDIA_API_KEY): Optional confidence for lifecycle phase actions after runAction switched to the shared supervisor and explicit secret-value redaction, especially timeout/log-flush behavior around long-running lifecycle work.

New E2E recommendations

Process supervision in live sandbox commands (medium): The new unit tests cover SIGTERM-ignoring children, but there is no dedicated live scenario assertion that intentionally times out a sandbox command with a stubborn child and verifies the scenario runner continues without orphaned processes.
- Suggested test: Add a sandbox lifecycle probe that launches a SIGTERM-ignoring in-sandbox child under a short timeout, asserts timedOut=true classification, and verifies cleanup/no orphan before continuing the scenario.

Dispatch hint

Workflow: .github/workflows/e2e-scenarios.yaml
jobs input: scenarios=ubuntu-repo-cloud-openclaw

github-actions · 2026-06-09T07:19:22Z

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

e2e-scenarios-all: Changes touch shared scenario framework/runner infrastructure: shell supervision, trusted command validation, shell probe plumbing, phase orchestrator spawning/redaction, and shared probe utilities used by multiple scenario suites. Per policy, runner/runtime changes require the full scenario fan-out.
- Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

None.

Relevant changed files

test/e2e-scenario/framework-tests/e2e-phase-orchestrators.test.ts
test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts
test/e2e-scenario/framework/shell-probe.ts
test/e2e-scenario/framework/shell/supervisor.ts
test/e2e-scenario/framework/shell/trusted-command.ts
test/e2e-scenario/scenarios/orchestrators/phase.ts
test/e2e-scenario/scenarios/probes/diagnostics.ts
test/e2e-scenario/scenarios/probes/docs-validation.ts
test/e2e-scenario/scenarios/probes/injection-blocked.ts
test/e2e-scenario/scenarios/probes/network-policy.ts
test/e2e-scenario/scenarios/probes/shields-config.ts
test/e2e-scenario/scenarios/probes/util.ts

github-actions · 2026-06-09T07:19:51Z

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 0 still apply, 0 new items found

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/e2e-scenario/scenarios/orchestrators/phase.ts (1)

174-181: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass declared secret values into redactString.

Line 203/206 and Line 453/456 only run the regex/context redactors. Since these children still receive action.secretEnv / step.secretEnv, a probe that echoes one of those values verbatim can now leak it into the evidence log and stderrTail unchanged.

🔧 Minimal fix

+    const explicitRedactions = (action.secretEnv ?? [])
+      .map((key) => env[key])
+      .filter((value): value is string => typeof value === "string" && value.length > 0);
+
     const supervised = await superviseChild(child, {
       timeoutMs: timeoutSeconds * 1_000,
       onStdout: (chunk) => {
-        logStream.write(redactString(chunk));
+        logStream.write(redactString(chunk, explicitRedactions));
       },
       onStderr: (chunk) => {
-        const redacted = redactString(chunk);
+        const redacted = redactString(chunk, explicitRedactions);
         logStream.write(redacted);
         stderrTail = (stderrTail + redacted).slice(-4096);
       },
     });

Mirror the same change in runShellStep using step.secretEnv.

Also applies to: 202-208, 390-397, 452-458

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/scenarios/orchestrators/phase.ts` around lines 174 - 181,
The regex/context-only redaction calls need to also consider declared secret
values so verbatim secrets from child processes aren't leaked; update
runShellStep to pass step.secretEnv into redactString (just like the change
already applied for action.secretEnv when building env) wherever you produce
evidence logs or stderrTail using redactString, ensuring both action.secretEnv
and step.secretEnv are forwarded to redactString in the code paths that create
evidence/log entries; search for runShellStep, redactString, step.secretEnv and
update the redaction calls to include the secret list.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/framework/shell/supervisor.ts`:
- Around line 101-114: The abort path can be misreported as a timeout because
the wall-clock timeout remains active after opts.signal triggers; update the
abort handling so that onAbort clears the timeout (clearTimeout(timeout)) before
calling terminate and ensure that terminate (or the surrounding logic that sets
timedOut) is not setting timedOut when termination was caused by opts.signal;
specifically modify the onAbort handler and/or terminate to clear the timer and
propagate an explicit abort reason (or a boolean) so timedOut remains false for
external cancels (references: timeout, timedOut, terminate, onAbort,
opts.signal).

In `@test/e2e-scenario/scenarios/orchestrators/phase.ts`:
- Around line 183-184: validateShellToken can throw on NUL bytes and those
exceptions currently escape runAction / runShellStep (seen where bashArgs are
mapped and where runShellStep is called), aborting the entire phase; wrap the
validateShellToken call(s) inside runShellStep/runAction so any thrown error is
caught and converted into a structured failure object with type "runner-infra"
(include the original error message in the failure details) and return that
result instead of letting the exception bubble to run()/runStep(); update both
call sites (the bashArgs mapping and the other occurrence around runShellStep)
to follow this pattern and ensure callers handle the structured failure return.

In `@test/e2e-scenario/scenarios/probes/diagnostics.ts`:
- Around line 89-95: When supervised.spawnError is present, don't collapse it
into a normal child exit path; instead record the spawn failure separately (e.g.
set a boolean like infraFailure/runnerInfra or a distinct sentinel variable)
while still appending the message to stderrTail, and leave exitCode
unset/nullable (or set to a distinct sentinel) so later logic that inspects
exitCode !== 0 can still differentiate a real child exit from a runner-infra
spawn failure; update the code around supervised.spawnError, exitCode and any
later classification logic that checks exitCode to use the new
infraFailure/runnerInfra signal when classifying failures.

In `@test/e2e-scenario/scenarios/probes/docs-validation.ts`:
- Around line 78-88: The runCheck() path in docs-validation.ts currently masks
supervised.spawnError by converting it into exitCode: 127; update runCheck() and
the DocsCheckResult shape so it returns the original spawn error instead of an
artificial exit code: add/return a spawnError (or error) field on
DocsCheckResult when supervised.spawnError is present (preserve
stderrTail/stdoutTail and elapsedMs), and ensure any callers of
runCheck()/DocsCheckResult (the docs check runner/runner-infra classifier)
inspect this new spawnError field rather than treating exitCode 127 as a normal
script failure; update type/interface for DocsCheckResult and usages that
construct/consume it accordingly (look for runCheck and DocsCheckResult in
docs-validation.ts and consumer code).

In `@test/e2e-scenario/scenarios/probes/util.ts`:
- Around line 126-141: The returned command result currently omits the
supervised.timedOut flag, so update both return paths (the spawnError branch and
the normal branch around supervised.spawnError, supervised.exitCode,
supervised.signal) to include timedOut: supervised.timedOut (or false if absent)
so callers can detect timeouts reliably; search for superviseChild /
SuperviseResult and ensure the returned object mirrors timedOut from the
supervise result (and then update downstream callers to branch on
result.timedOut instead of inferring from signal).

---

Outside diff comments:
In `@test/e2e-scenario/scenarios/orchestrators/phase.ts`:
- Around line 174-181: The regex/context-only redaction calls need to also
consider declared secret values so verbatim secrets from child processes aren't
leaked; update runShellStep to pass step.secretEnv into redactString (just like
the change already applied for action.secretEnv when building env) wherever you
produce evidence logs or stderrTail using redactString, ensuring both
action.secretEnv and step.secretEnv are forwarded to redactString in the code
paths that create evidence/log entries; search for runShellStep, redactString,
step.secretEnv and update the redaction calls to include the secret list.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 73dd647b-2149-4482-834c-320b399ff340

📥 Commits

Reviewing files that changed from the base of the PR and between 38c03b1 and da50821.

📒 Files selected for processing (8)

test/e2e-scenario/framework-tests/e2e-shell-supervisor.test.ts
test/e2e-scenario/framework/shell-probe.ts
test/e2e-scenario/framework/shell/supervisor.ts
test/e2e-scenario/framework/shell/trusted-command.ts
test/e2e-scenario/scenarios/orchestrators/phase.ts
test/e2e-scenario/scenarios/probes/diagnostics.ts
test/e2e-scenario/scenarios/probes/docs-validation.ts
test/e2e-scenario/scenarios/probes/util.ts

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

refactor(test/e2e): consolidate fixture shell probe with scenario spa…

da50821

…wn infra Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added the area: e2e End-to-end tests, nightly failures, or validation infrastructure label Jun 9, 2026

github-advanced-security AI found potential problems Jun 9, 2026

View reviewed changes

Comment thread test/e2e-scenario/framework/shell-probe.ts Fixed

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

fix(test/e2e): tighten review-flagged spawn-boundary contracts

a30ddf3

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added the refactor PR restructures code without intended behavior change label Jun 9, 2026

Merge branch 'main' into refactor/e2e-shell-spawn-consolidation

c213718

Conversation

laitingsheng commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

laitingsheng commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading