fix(server): contain native watcher aborts#6987
Conversation
Co-authored-by: Orca <help@stably.ai>
eeb8b7f to
f7982c8
Compare
|
Ready to review this PR? Stage has broken it down into 6 individual chapters for you: Chapters generated by Stage for commit f7982c8 on Jul 1, 2026 4:42am UTC. |
📝 WalkthroughWalkthroughChangesThis PR migrates the recursive file explorer watcher from a Sequence Diagram(s)See diagrams embedded in the hidden review stack artifact above for the file watcher IPC lifecycle, headless serve daemon PTY startup, and reproduction harness pass/fail flow. Related PRs: None identified. Related issues: Suggested labels: main-process, testing, packaging, documentation Suggested reviewers: (none specified) Poem A worker thread once feared the crash, 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches📝 Generate docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (7)
repro/issue-6635/worker-thread-crash.mjs (1)
15-15: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valuePrefer
import.meta.dirnameover manualdirname(fileURLToPath(...)).Static analysis flags this pattern (
unicorn/prefer-import-meta-properties). Since the Dockerfile pinsnode:24-bookworm,import.meta.dirname(stable since Node 21.2/20.11) should be safe here.♻️ Proposed simplification
-import { fileURLToPath } from 'node:url' -import { dirname, join } from 'node:path' - -const here = dirname(fileURLToPath(import.meta.url)) +import { join } from 'node:path' + +const here = import.meta.dirnameSource: Linters/SAST tools
repro/issue-6635/child-process-survives.mjs (1)
11-11: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valuePrefer
import.meta.dirnameover manualdirname(fileURLToPath(...)).Same static analysis hint as
worker-thread-crash.mjs:15.Source: Linters/SAST tools
repro/issue-6635/verify-fix.mjs (1)
18-18: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valuePrefer
import.meta.dirnameover manualdirname(fileURLToPath(...)).Same static analysis hint as the other repro scripts.
Source: Linters/SAST tools
repro/issue-6635/Dockerfile (2)
1-11: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low valueContainer runs as root; Trivy flags missing non-root
USER.Low risk for a throwaway repro image, but a quick hardening win.
🛡️ Proposed fix
WORKDIR /repro +RUN groupadd -r repro && useradd -r -g repro repro + # Install the real `@parcel/watcher` so worker-real-watcher.mjs can exercise the # actual native module on Linux (the prime suspect from `#5377/`#4851). RUN npm init -y >/dev/null 2>&1 && npm install `@parcel/watcher`@^2.5.6 >/dev/null 2>&1 COPY native-abort.js worker-thread-crash.mjs child-process-survives.mjs run.sh ./ COPY worker-real-watcher.mjs ./ +RUN chown -R repro:repro /repro +USER reproSource: Linters/SAST tools
11-11: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low valueSuppressing
npm installstderr hides failures in a diagnostic image.If the install fails silently, the container proceeds without
@parcel/watcher, andworker-real-watcher.mjsfails later with a confusing error unrelated to the actual root cause.repro/issue-6635/README.md (2)
10-17: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winClarify tense: this describes the pre-fix architecture, not the current one.
After this PR merges, the watcher will run in a forked child process rather than a
worker_thread, so phrasing like "Today the watcher runs..." will read as inaccurate to future readers. Consider rephrasing to explicitly past-tense ("Before this fix, the watcher ran...").
75-84: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAdd a language hint to the fenced code block.
Flagged by markdownlint (MD040).
📝 Proposed fix
-``` +```text >> [CURRENT] native abort inside a worker_thread (file-watcher-worker.ts) run.sh: line 17: 14 Aborted ...Source: Linters/SAST tools
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 605ab879-8180-4944-a9af-382e8604f2ba
📒 Files selected for processing (20)
config/electron-builder.config.cjsconfig/scripts/electron-builder-config.test.mjselectron.vite.config.tsrepro/issue-6635/Dockerfilerepro/issue-6635/README.mdrepro/issue-6635/child-process-survives.mjsrepro/issue-6635/native-abort.jsrepro/issue-6635/run.shrepro/issue-6635/verify-fix.mjsrepro/issue-6635/worker-real-watcher.mjsrepro/issue-6635/worker-thread-crash.mjssrc/main/daemon/daemon-entry.test.tssrc/main/daemon/daemon-entry.tssrc/main/index.tssrc/main/runtime/file-watcher-host.test.tssrc/main/runtime/file-watcher-host.tssrc/main/runtime/file-watcher-process.tssrc/main/runtime/orca-runtime-files.tssrc/main/startup/configure-process-serve-observability.test.tssrc/main/startup/configure-process.ts
| process.on('uncaughtExceptionMonitor', (error) => { | ||
| const err = error as NodeJS.ErrnoException | undefined | ||
| if (err && (err.code === 'EIO' || err.code === 'EPIPE')) { | ||
| return | ||
| } | ||
| console.error( | ||
| `[serve] Uncaught exception (process will exit): ${err?.name ?? 'Error'}: ${ | ||
| err?.message || '(no message)' | ||
| }\n${err?.stack ?? ''}` | ||
| ) | ||
| }) | ||
|
|
||
| process.on('unhandledRejection', (reason) => { | ||
| const err = reason instanceof Error ? reason : undefined | ||
| console.error( | ||
| `[serve] Unhandled promise rejection: ${err?.name ?? typeof reason}: ${ | ||
| err?.message ?? String(reason) | ||
| }\n${err?.stack ?? ''}` | ||
| ) | ||
| // Why: registering an unhandledRejection listener would otherwise suppress | ||
| // Node's default fatal behavior, silently leaving serve running in a | ||
| // possibly-inconsistent state. This installer is observability only — it | ||
| // must not change fatality — so re-raise on the next tick to preserve the | ||
| // crash (mirrors installUncaughtPipeErrorGuard's re-throw). | ||
| setImmediate(() => { | ||
| throw reason | ||
| }) | ||
| }) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Apply the pipe-error filter to unhandled rejections too.
EIO/EPIPE filtering only covers uncaughtExceptionMonitor; a promise rejection with the same expected teardown error still logs as a [serve] Unhandled promise rejection and gets rethrown.
Proposed fix
export function installServeObservability(): void {
+ const isExpectedPipeError = (error: unknown): error is NodeJS.ErrnoException =>
+ Boolean(
+ error &&
+ typeof error === 'object' &&
+ 'code' in error &&
+ ((error as NodeJS.ErrnoException).code === 'EIO' ||
+ (error as NodeJS.ErrnoException).code === 'EPIPE')
+ )
+
process.on('uncaughtExceptionMonitor', (error) => {
- const err = error as NodeJS.ErrnoException | undefined
- if (err && (err.code === 'EIO' || err.code === 'EPIPE')) {
+ if (isExpectedPipeError(error)) {
return
}
+ const err = error as NodeJS.ErrnoException | undefined
console.error(
`[serve] Uncaught exception (process will exit): ${err?.name ?? 'Error'}: ${
err?.message || '(no message)'
}\n${err?.stack ?? ''}`
)
})
process.on('unhandledRejection', (reason) => {
+ if (isExpectedPipeError(reason)) {
+ return
+ }
const err = reason instanceof Error ? reason : undefined📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| process.on('uncaughtExceptionMonitor', (error) => { | |
| const err = error as NodeJS.ErrnoException | undefined | |
| if (err && (err.code === 'EIO' || err.code === 'EPIPE')) { | |
| return | |
| } | |
| console.error( | |
| `[serve] Uncaught exception (process will exit): ${err?.name ?? 'Error'}: ${ | |
| err?.message || '(no message)' | |
| }\n${err?.stack ?? ''}` | |
| ) | |
| }) | |
| process.on('unhandledRejection', (reason) => { | |
| const err = reason instanceof Error ? reason : undefined | |
| console.error( | |
| `[serve] Unhandled promise rejection: ${err?.name ?? typeof reason}: ${ | |
| err?.message ?? String(reason) | |
| }\n${err?.stack ?? ''}` | |
| ) | |
| // Why: registering an unhandledRejection listener would otherwise suppress | |
| // Node's default fatal behavior, silently leaving serve running in a | |
| // possibly-inconsistent state. This installer is observability only — it | |
| // must not change fatality — so re-raise on the next tick to preserve the | |
| // crash (mirrors installUncaughtPipeErrorGuard's re-throw). | |
| setImmediate(() => { | |
| throw reason | |
| }) | |
| }) | |
| export function installServeObservability(): void { | |
| const isExpectedPipeError = (error: unknown): error is NodeJS.ErrnoException => | |
| Boolean( | |
| error && | |
| typeof error === 'object' && | |
| 'code' in error && | |
| ((error as NodeJS.ErrnoException).code === 'EIO' || | |
| (error as NodeJS.ErrnoException).code === 'EPIPE') | |
| ) | |
| process.on('uncaughtExceptionMonitor', (error) => { | |
| if (isExpectedPipeError(error)) { | |
| return | |
| } | |
| const err = error as NodeJS.ErrnoException | undefined | |
| console.error( | |
| `[serve] Uncaught exception (process will exit): ${err?.name ?? 'Error'}: ${ | |
| err?.message || '(no message)' | |
| }\n${err?.stack ?? ''}` | |
| ) | |
| }) | |
| process.on('unhandledRejection', (reason) => { | |
| if (isExpectedPipeError(reason)) { | |
| return | |
| } | |
| const err = reason instanceof Error ? reason : undefined | |
| console.error( | |
| `[serve] Unhandled promise rejection: ${err?.name ?? typeof reason}: ${ | |
| err?.message ?? String(reason) | |
| }\n${err?.stack ?? ''}` | |
| ) | |
| // Why: registering an unhandledRejection listener would otherwise suppress | |
| // Node's default fatal behavior, silently leaving serve running in a | |
| // possibly-inconsistent state. This installer is observability only — it | |
| // must not change fatality — so re-raise on the next tick to preserve the | |
| // crash (mirrors installUncaughtPipeErrorGuard's re-throw). | |
| setImmediate(() => { | |
| throw reason | |
| }) | |
| }) |
There was a problem hiding this comment.
Actionable comments posted: 3
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5454e808-7d9d-4d22-909f-7ae964c43876
📒 Files selected for processing (20)
config/electron-builder.config.cjsconfig/scripts/electron-builder-config.test.mjselectron.vite.config.tsrepro/issue-6635/Dockerfilerepro/issue-6635/README.mdrepro/issue-6635/child-process-survives.mjsrepro/issue-6635/native-abort.jsrepro/issue-6635/run.shrepro/issue-6635/verify-fix.mjsrepro/issue-6635/worker-real-watcher.mjsrepro/issue-6635/worker-thread-crash.mjssrc/main/daemon/daemon-entry.test.tssrc/main/daemon/daemon-entry.tssrc/main/index.tssrc/main/runtime/file-watcher-host.test.tssrc/main/runtime/file-watcher-host.tssrc/main/runtime/file-watcher-process.tssrc/main/runtime/orca-runtime-files.tssrc/main/startup/configure-process-serve-observability.test.tssrc/main/startup/configure-process.ts
✅ Files skipped from review due to trivial changes (1)
- src/main/runtime/orca-runtime-files.ts
🚧 Files skipped from review as they are similar to previous changes (16)
- repro/issue-6635/native-abort.js
- src/main/daemon/daemon-entry.test.ts
- config/scripts/electron-builder-config.test.mjs
- src/main/startup/configure-process-serve-observability.test.ts
- config/electron-builder.config.cjs
- repro/issue-6635/worker-thread-crash.mjs
- src/main/startup/configure-process.ts
- electron.vite.config.ts
- repro/issue-6635/run.sh
- repro/issue-6635/verify-fix.mjs
- repro/issue-6635/worker-real-watcher.mjs
- src/main/daemon/daemon-entry.ts
- repro/issue-6635/child-process-survives.mjs
- src/main/runtime/file-watcher-process.ts
- src/main/runtime/file-watcher-host.test.ts
- src/main/index.ts
| # Install the real @parcel/watcher so worker-real-watcher.mjs can exercise the | ||
| # actual native module on Linux (the prime suspect from #5377/#4851). | ||
| RUN npm init -y >/dev/null 2>&1 && npm install @parcel/watcher@^2.5.6 >/dev/null 2>&1 |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
git ls-files repro/issue-6635/Dockerfile repro/issue-6635 package.json package-lock.json npm-shrinkwrap.json pnpm-lock.yaml yarn.lock README.md 2>/dev/null || true
echo "---"
sed -n '1,120p' repro/issue-6635/Dockerfile
echo "---"
rg -n "deterministic|parcel/watcher|issue-6635|worker-real-watcher" -S repro/issue-6635 README.md . 2>/dev/null || trueRepository: stablyai/orca
Length of output: 33470
Pin @parcel/watcher exactly in repro/issue-6635/Dockerfile:11.
The repro image is meant to stay deterministic, but ^2.5.6 will drift to newer 2.x releases and can change the native behavior this harness depends on.
| 1. **The crash is process-fatal and uncatchable (root cause).** | ||
| `@parcel/watcher`'s native code raises a C++ `Napi::Error` during worker | ||
| teardown (`FreeEnvironment` / `CleanupHandles`). Today the watcher runs in a | ||
| **Node `worker_thread`** (`src/main/runtime/file-watcher-worker.ts`). A native | ||
| `abort()` / `std::terminate()` raised on a worker thread is **not** delivered | ||
| to the host as a catchable `worker.on('error')` event, and **not** catchable | ||
| by `process.on('uncaughtException')` — it raises `SIGABRT` (signal 6) for the | ||
| **entire process**. So no JS handler anywhere can save the `serve` process. |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Rephrase this as pre-fix behavior.
Today plus file-watcher-worker.ts is stale in this PR’s final state, which already moved the runtime watcher to file-watcher-process.ts. Framing this paragraph as “before this fix” will keep the repro docs accurate.
| const child = fork(getFileWatcherChildPath(), [], { | ||
| // Why: ELECTRON_RUN_AS_NODE runs the child as plain Node (no Electron | ||
| // GPU/display init that can interfere with native modules). The watch | ||
| // target rides env, not argv, so spaces/quotes in the path survive. | ||
| env: { | ||
| ...process.env, | ||
| ELECTRON_RUN_AS_NODE: '1', | ||
| ORCA_FILE_WATCH_ROOT: rootPath, | ||
| ORCA_FILE_WATCH_IGNORE: RUNTIME_FILE_WATCH_IGNORE.join('\n') | ||
| }, | ||
| stdio: ['ignore', 'ignore', 'ignore', 'ipc'], | ||
| ...(process.platform === 'win32' ? { windowsHide: true } : {}) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Don't drop the watcher child's stderr.
With stdio: ['ignore', 'ignore', 'ignore', 'ipc'], a native abort only leaves the parent with { code, signal }. That loses the Napi::Error text / JS stack the linked issue explicitly asked us to surface.
Suggested change
- stdio: ['ignore', 'ignore', 'ignore', 'ipc'],
+ stdio: ['ignore', 'ignore', 'pipe', 'ipc'],
...(process.platform === 'win32' ? { windowsHide: true } : {})
})
+
+ child.stderr?.setEncoding('utf8')
+ child.stderr?.on('data', (chunk) => {
+ console.error('[runtime-files.watch:child]', chunk.trimEnd())
+ })📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const child = fork(getFileWatcherChildPath(), [], { | |
| // Why: ELECTRON_RUN_AS_NODE runs the child as plain Node (no Electron | |
| // GPU/display init that can interfere with native modules). The watch | |
| // target rides env, not argv, so spaces/quotes in the path survive. | |
| env: { | |
| ...process.env, | |
| ELECTRON_RUN_AS_NODE: '1', | |
| ORCA_FILE_WATCH_ROOT: rootPath, | |
| ORCA_FILE_WATCH_IGNORE: RUNTIME_FILE_WATCH_IGNORE.join('\n') | |
| }, | |
| stdio: ['ignore', 'ignore', 'ignore', 'ipc'], | |
| ...(process.platform === 'win32' ? { windowsHide: true } : {}) | |
| const child = fork(getFileWatcherChildPath(), [], { | |
| // Why: ELECTRON_RUN_AS_NODE runs the child as plain Node (no Electron | |
| // GPU/display init that can interfere with native modules). The watch | |
| // target rides env, not argv, so spaces/quotes in the path survive. | |
| env: { | |
| ...process.env, | |
| ELECTRON_RUN_AS_NODE: '1', | |
| ORCA_FILE_WATCH_ROOT: rootPath, | |
| ORCA_FILE_WATCH_IGNORE: RUNTIME_FILE_WATCH_IGNORE.join('\n') | |
| }, | |
| stdio: ['ignore', 'ignore', 'pipe', 'ipc'], | |
| ...(process.platform === 'win32' ? { windowsHide: true } : {}) | |
| }) | |
| child.stderr?.setEncoding('utf8') | |
| child.stderr?.on('data', (chunk) => { | |
| console.error('[runtime-files.watch:child]', chunk.trimEnd()) | |
| }) |
Summary
orca serveFixes #6635
Validation
pnpm exec oxlint --format githubnode config/scripts/ensure-native-runtime.mjs --runtime=node && pnpm exec vitest run --config config/vitest.config.ts src/main/daemon/daemon-entry.test.ts src/main/runtime/file-watcher-host.test.ts src/main/startup/configure-process-serve-observability.test.ts config/scripts/electron-builder-config.test.mjspnpm run typecheck:nodeNote: local validation was run after refreshing dependencies from the lockfile so oxlint matched CI (
1.71.0). The local shell is Node v26, sopnpm run typecheck:nodeprints the repo engine warning for Node 24 but exits successfully.Rebase / PR #6844 note
origin/mainat8a805e256src/main/runtime/file-watcher-host.ts;src/main/index.tsauto-merges