Skip to content

fix(server): contain native watcher aborts#6987

Open
Jinwoo-H wants to merge 1 commit into
mainfrom
Jinwoo-H/bug-headless-orca-serve-sigabrt-uncaught-napi-er
Open

fix(server): contain native watcher aborts#6987
Jinwoo-H wants to merge 1 commit into
mainfrom
Jinwoo-H/bug-headless-orca-serve-sigabrt-uncaught-napi-er

Conversation

@Jinwoo-H

@Jinwoo-H Jinwoo-H commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #6635

Validation

  • pnpm exec oxlint --format github
  • node config/scripts/ensure-native-runtime.mjs --runtime=node && pnpm exec vitest run --config config/vitest.config.ts src/main/daemon/daemon-entry.test.ts src/main/runtime/file-watcher-host.test.ts src/main/startup/configure-process-serve-observability.test.ts config/scripts/electron-builder-config.test.mjs
  • pnpm run typecheck:node
  • pre-commit hook passed on the amended commit

Note: local validation was run after refreshing dependencies from the lockfile so oxlint matched CI (1.71.0). The local shell is Node v26, so pnpm run typecheck:node prints the repo engine warning for Node 24 but exits successfully.

Rebase / PR #6844 note

Co-authored-by: Orca <help@stably.ai>
@Jinwoo-H Jinwoo-H force-pushed the Jinwoo-H/bug-headless-orca-serve-sigabrt-uncaught-napi-er branch from eeb8b7f to f7982c8 Compare July 1, 2026 04:42
@stage-review

stage-review Bot commented Jul 1, 2026

Copy link
Copy Markdown

Ready to review this PR? Stage has broken it down into 6 individual chapters for you:

Title
1 Add reproduction harness for issue #6635
2 Move file watcher to forked child process
3 Configure build and packaging for watcher process
4 Route headless PTYs through persistent daemon
5 Improve serve crash observability
6 Refine daemon native error suppression
Open in Stage

Chapters generated by Stage for commit f7982c8 on Jul 1, 2026 4:42am UTC.

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Changes

This PR migrates the recursive file explorer watcher from a worker_threads-based implementation to a child_process.fork-based one, updating the host, the forked process entry, tests, and packaging/build configuration to unpack and reference file-watcher-process.js. It also adds a shared helper for suppressing specific native node-pty errors in the daemon process, introduces serve-mode process observability (installServeObservability), and wires headless orca serve PTY terminals through the persistent daemon with a timeout-based fallback to the local provider. Additionally, a standalone reproduction/verification harness for issue #6635 is added under repro/issue-6635, including scripts, a Dockerfile, and documentation demonstrating SIGABRT process-fatality differences between worker threads and forked child processes.

Sequence Diagram(s)

See diagrams embedded in the hidden review stack artifact above for the file watcher IPC lifecycle, headless serve daemon PTY startup, and reproduction harness pass/fail flow.

Related PRs: None identified.

Related issues: #6635 (native SIGABRT crash isolation for file watcher and headless serve terminals).

Suggested labels: main-process, testing, packaging, documentation

Suggested reviewers: (none specified)

Poem

A worker thread once feared the crash,
So we forked a child to weather the gash.
SIGABRT roars, the child takes the hit,
The host survives — not one bit quit.
With daemons and serve now singing in tune,
This rabbit hops on, fixed none too soon. 🐇

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers summary and validation, but it omits the required Screenshots, AI Review Report, Security Audit, and Notes sections. Add the missing sections and include the requested AI review, security audit, screenshots/no visual change note, and platform-specific notes.
Docstring Coverage ⚠️ Warning Docstring coverage is 35.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and matches the main change: containing native watcher aborts in the server.
Linked Issues check ✅ Passed The PR addresses #6635 by moving the watcher to a forked child process, routing PTYs through the persistent daemon, and adding tests and repro coverage.
Out of Scope Changes check ✅ Passed The added repro scripts, tests, packaging updates, and observability changes all support the linked watcher-abort fix.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
repro/issue-6635/worker-thread-crash.mjs (1)

15-15: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Prefer import.meta.dirname over manual dirname(fileURLToPath(...)).

Static analysis flags this pattern (unicorn/prefer-import-meta-properties). Since the Dockerfile pins node:24-bookworm, import.meta.dirname (stable since Node 21.2/20.11) should be safe here.

♻️ Proposed simplification
-import { fileURLToPath } from 'node:url'
-import { dirname, join } from 'node:path'
-
-const here = dirname(fileURLToPath(import.meta.url))
+import { join } from 'node:path'
+
+const here = import.meta.dirname

Source: Linters/SAST tools

repro/issue-6635/child-process-survives.mjs (1)

11-11: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Prefer import.meta.dirname over manual dirname(fileURLToPath(...)).

Same static analysis hint as worker-thread-crash.mjs:15.

Source: Linters/SAST tools

repro/issue-6635/verify-fix.mjs (1)

18-18: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Prefer import.meta.dirname over manual dirname(fileURLToPath(...)).

Same static analysis hint as the other repro scripts.

Source: Linters/SAST tools

repro/issue-6635/Dockerfile (2)

1-11: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low value

Container runs as root; Trivy flags missing non-root USER.

Low risk for a throwaway repro image, but a quick hardening win.

🛡️ Proposed fix
 WORKDIR /repro
 
+RUN groupadd -r repro && useradd -r -g repro repro
+
 # Install the real `@parcel/watcher` so worker-real-watcher.mjs can exercise the
 # actual native module on Linux (the prime suspect from `#5377/`#4851).
 RUN npm init -y >/dev/null 2>&1 && npm install `@parcel/watcher`@^2.5.6 >/dev/null 2>&1
 
 COPY native-abort.js worker-thread-crash.mjs child-process-survives.mjs run.sh ./
 COPY worker-real-watcher.mjs ./
+RUN chown -R repro:repro /repro
+USER repro

Source: Linters/SAST tools


11-11: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

Suppressing npm install stderr hides failures in a diagnostic image.

If the install fails silently, the container proceeds without @parcel/watcher, and worker-real-watcher.mjs fails later with a confusing error unrelated to the actual root cause.

repro/issue-6635/README.md (2)

10-17: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Clarify tense: this describes the pre-fix architecture, not the current one.

After this PR merges, the watcher will run in a forked child process rather than a worker_thread, so phrasing like "Today the watcher runs..." will read as inaccurate to future readers. Consider rephrasing to explicitly past-tense ("Before this fix, the watcher ran...").


75-84: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add a language hint to the fenced code block.

Flagged by markdownlint (MD040).

📝 Proposed fix
-```
+```text
 >> [CURRENT] native abort inside a worker_thread (file-watcher-worker.ts)
 run.sh: line 17:    14 Aborted                 ...

Source: Linters/SAST tools


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 605ab879-8180-4944-a9af-382e8604f2ba

📥 Commits

Reviewing files that changed from the base of the PR and between c53a730 and eeb8b7f.

📒 Files selected for processing (20)
  • config/electron-builder.config.cjs
  • config/scripts/electron-builder-config.test.mjs
  • electron.vite.config.ts
  • repro/issue-6635/Dockerfile
  • repro/issue-6635/README.md
  • repro/issue-6635/child-process-survives.mjs
  • repro/issue-6635/native-abort.js
  • repro/issue-6635/run.sh
  • repro/issue-6635/verify-fix.mjs
  • repro/issue-6635/worker-real-watcher.mjs
  • repro/issue-6635/worker-thread-crash.mjs
  • src/main/daemon/daemon-entry.test.ts
  • src/main/daemon/daemon-entry.ts
  • src/main/index.ts
  • src/main/runtime/file-watcher-host.test.ts
  • src/main/runtime/file-watcher-host.ts
  • src/main/runtime/file-watcher-process.ts
  • src/main/runtime/orca-runtime-files.ts
  • src/main/startup/configure-process-serve-observability.test.ts
  • src/main/startup/configure-process.ts

Comment thread src/main/runtime/file-watcher-process.ts
Comment on lines +130 to +157
process.on('uncaughtExceptionMonitor', (error) => {
const err = error as NodeJS.ErrnoException | undefined
if (err && (err.code === 'EIO' || err.code === 'EPIPE')) {
return
}
console.error(
`[serve] Uncaught exception (process will exit): ${err?.name ?? 'Error'}: ${
err?.message || '(no message)'
}\n${err?.stack ?? ''}`
)
})

process.on('unhandledRejection', (reason) => {
const err = reason instanceof Error ? reason : undefined
console.error(
`[serve] Unhandled promise rejection: ${err?.name ?? typeof reason}: ${
err?.message ?? String(reason)
}\n${err?.stack ?? ''}`
)
// Why: registering an unhandledRejection listener would otherwise suppress
// Node's default fatal behavior, silently leaving serve running in a
// possibly-inconsistent state. This installer is observability only — it
// must not change fatality — so re-raise on the next tick to preserve the
// crash (mirrors installUncaughtPipeErrorGuard's re-throw).
setImmediate(() => {
throw reason
})
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Apply the pipe-error filter to unhandled rejections too.

EIO/EPIPE filtering only covers uncaughtExceptionMonitor; a promise rejection with the same expected teardown error still logs as a [serve] Unhandled promise rejection and gets rethrown.

Proposed fix
 export function installServeObservability(): void {
+  const isExpectedPipeError = (error: unknown): error is NodeJS.ErrnoException =>
+    Boolean(
+      error &&
+        typeof error === 'object' &&
+        'code' in error &&
+        ((error as NodeJS.ErrnoException).code === 'EIO' ||
+          (error as NodeJS.ErrnoException).code === 'EPIPE')
+    )
+
   process.on('uncaughtExceptionMonitor', (error) => {
-    const err = error as NodeJS.ErrnoException | undefined
-    if (err && (err.code === 'EIO' || err.code === 'EPIPE')) {
+    if (isExpectedPipeError(error)) {
       return
     }
+    const err = error as NodeJS.ErrnoException | undefined
     console.error(
       `[serve] Uncaught exception (process will exit): ${err?.name ?? 'Error'}: ${
         err?.message || '(no message)'
       }\n${err?.stack ?? ''}`
     )
   })
 
   process.on('unhandledRejection', (reason) => {
+    if (isExpectedPipeError(reason)) {
+      return
+    }
     const err = reason instanceof Error ? reason : undefined
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
process.on('uncaughtExceptionMonitor', (error) => {
const err = error as NodeJS.ErrnoException | undefined
if (err && (err.code === 'EIO' || err.code === 'EPIPE')) {
return
}
console.error(
`[serve] Uncaught exception (process will exit): ${err?.name ?? 'Error'}: ${
err?.message || '(no message)'
}\n${err?.stack ?? ''}`
)
})
process.on('unhandledRejection', (reason) => {
const err = reason instanceof Error ? reason : undefined
console.error(
`[serve] Unhandled promise rejection: ${err?.name ?? typeof reason}: ${
err?.message ?? String(reason)
}\n${err?.stack ?? ''}`
)
// Why: registering an unhandledRejection listener would otherwise suppress
// Node's default fatal behavior, silently leaving serve running in a
// possibly-inconsistent state. This installer is observability only — it
// must not change fatality — so re-raise on the next tick to preserve the
// crash (mirrors installUncaughtPipeErrorGuard's re-throw).
setImmediate(() => {
throw reason
})
})
export function installServeObservability(): void {
const isExpectedPipeError = (error: unknown): error is NodeJS.ErrnoException =>
Boolean(
error &&
typeof error === 'object' &&
'code' in error &&
((error as NodeJS.ErrnoException).code === 'EIO' ||
(error as NodeJS.ErrnoException).code === 'EPIPE')
)
process.on('uncaughtExceptionMonitor', (error) => {
if (isExpectedPipeError(error)) {
return
}
const err = error as NodeJS.ErrnoException | undefined
console.error(
`[serve] Uncaught exception (process will exit): ${err?.name ?? 'Error'}: ${
err?.message || '(no message)'
}\n${err?.stack ?? ''}`
)
})
process.on('unhandledRejection', (reason) => {
if (isExpectedPipeError(reason)) {
return
}
const err = reason instanceof Error ? reason : undefined
console.error(
`[serve] Unhandled promise rejection: ${err?.name ?? typeof reason}: ${
err?.message ?? String(reason)
}\n${err?.stack ?? ''}`
)
// Why: registering an unhandledRejection listener would otherwise suppress
// Node's default fatal behavior, silently leaving serve running in a
// possibly-inconsistent state. This installer is observability only — it
// must not change fatality — so re-raise on the next tick to preserve the
// crash (mirrors installUncaughtPipeErrorGuard's re-throw).
setImmediate(() => {
throw reason
})
})

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5454e808-7d9d-4d22-909f-7ae964c43876

📥 Commits

Reviewing files that changed from the base of the PR and between eeb8b7f and f7982c8.

📒 Files selected for processing (20)
  • config/electron-builder.config.cjs
  • config/scripts/electron-builder-config.test.mjs
  • electron.vite.config.ts
  • repro/issue-6635/Dockerfile
  • repro/issue-6635/README.md
  • repro/issue-6635/child-process-survives.mjs
  • repro/issue-6635/native-abort.js
  • repro/issue-6635/run.sh
  • repro/issue-6635/verify-fix.mjs
  • repro/issue-6635/worker-real-watcher.mjs
  • repro/issue-6635/worker-thread-crash.mjs
  • src/main/daemon/daemon-entry.test.ts
  • src/main/daemon/daemon-entry.ts
  • src/main/index.ts
  • src/main/runtime/file-watcher-host.test.ts
  • src/main/runtime/file-watcher-host.ts
  • src/main/runtime/file-watcher-process.ts
  • src/main/runtime/orca-runtime-files.ts
  • src/main/startup/configure-process-serve-observability.test.ts
  • src/main/startup/configure-process.ts
✅ Files skipped from review due to trivial changes (1)
  • src/main/runtime/orca-runtime-files.ts
🚧 Files skipped from review as they are similar to previous changes (16)
  • repro/issue-6635/native-abort.js
  • src/main/daemon/daemon-entry.test.ts
  • config/scripts/electron-builder-config.test.mjs
  • src/main/startup/configure-process-serve-observability.test.ts
  • config/electron-builder.config.cjs
  • repro/issue-6635/worker-thread-crash.mjs
  • src/main/startup/configure-process.ts
  • electron.vite.config.ts
  • repro/issue-6635/run.sh
  • repro/issue-6635/verify-fix.mjs
  • repro/issue-6635/worker-real-watcher.mjs
  • src/main/daemon/daemon-entry.ts
  • repro/issue-6635/child-process-survives.mjs
  • src/main/runtime/file-watcher-process.ts
  • src/main/runtime/file-watcher-host.test.ts
  • src/main/index.ts

Comment on lines +9 to +11
# Install the real @parcel/watcher so worker-real-watcher.mjs can exercise the
# actual native module on Linux (the prime suspect from #5377/#4851).
RUN npm init -y >/dev/null 2>&1 && npm install @parcel/watcher@^2.5.6 >/dev/null 2>&1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

git ls-files repro/issue-6635/Dockerfile repro/issue-6635 package.json package-lock.json npm-shrinkwrap.json pnpm-lock.yaml yarn.lock README.md 2>/dev/null || true
echo "---"
sed -n '1,120p' repro/issue-6635/Dockerfile
echo "---"
rg -n "deterministic|parcel/watcher|issue-6635|worker-real-watcher" -S repro/issue-6635 README.md . 2>/dev/null || true

Repository: stablyai/orca

Length of output: 33470


Pin @parcel/watcher exactly in repro/issue-6635/Dockerfile:11.
The repro image is meant to stay deterministic, but ^2.5.6 will drift to newer 2.x releases and can change the native behavior this harness depends on.

Comment on lines +10 to +17
1. **The crash is process-fatal and uncatchable (root cause).**
`@parcel/watcher`'s native code raises a C++ `Napi::Error` during worker
teardown (`FreeEnvironment` / `CleanupHandles`). Today the watcher runs in a
**Node `worker_thread`** (`src/main/runtime/file-watcher-worker.ts`). A native
`abort()` / `std::terminate()` raised on a worker thread is **not** delivered
to the host as a catchable `worker.on('error')` event, and **not** catchable
by `process.on('uncaughtException')` — it raises `SIGABRT` (signal 6) for the
**entire process**. So no JS handler anywhere can save the `serve` process.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Rephrase this as pre-fix behavior.

Today plus file-watcher-worker.ts is stale in this PR’s final state, which already moved the runtime watcher to file-watcher-process.ts. Framing this paragraph as “before this fix” will keep the repro docs accurate.

Comment on lines +86 to +97
const child = fork(getFileWatcherChildPath(), [], {
// Why: ELECTRON_RUN_AS_NODE runs the child as plain Node (no Electron
// GPU/display init that can interfere with native modules). The watch
// target rides env, not argv, so spaces/quotes in the path survive.
env: {
...process.env,
ELECTRON_RUN_AS_NODE: '1',
ORCA_FILE_WATCH_ROOT: rootPath,
ORCA_FILE_WATCH_IGNORE: RUNTIME_FILE_WATCH_IGNORE.join('\n')
},
stdio: ['ignore', 'ignore', 'ignore', 'ipc'],
...(process.platform === 'win32' ? { windowsHide: true } : {})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Don't drop the watcher child's stderr.

With stdio: ['ignore', 'ignore', 'ignore', 'ipc'], a native abort only leaves the parent with { code, signal }. That loses the Napi::Error text / JS stack the linked issue explicitly asked us to surface.

Suggested change
-      stdio: ['ignore', 'ignore', 'ignore', 'ipc'],
+      stdio: ['ignore', 'ignore', 'pipe', 'ipc'],
       ...(process.platform === 'win32' ? { windowsHide: true } : {})
     })
+
+    child.stderr?.setEncoding('utf8')
+    child.stderr?.on('data', (chunk) => {
+      console.error('[runtime-files.watch:child]', chunk.trimEnd())
+    })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const child = fork(getFileWatcherChildPath(), [], {
// Why: ELECTRON_RUN_AS_NODE runs the child as plain Node (no Electron
// GPU/display init that can interfere with native modules). The watch
// target rides env, not argv, so spaces/quotes in the path survive.
env: {
...process.env,
ELECTRON_RUN_AS_NODE: '1',
ORCA_FILE_WATCH_ROOT: rootPath,
ORCA_FILE_WATCH_IGNORE: RUNTIME_FILE_WATCH_IGNORE.join('\n')
},
stdio: ['ignore', 'ignore', 'ignore', 'ipc'],
...(process.platform === 'win32' ? { windowsHide: true } : {})
const child = fork(getFileWatcherChildPath(), [], {
// Why: ELECTRON_RUN_AS_NODE runs the child as plain Node (no Electron
// GPU/display init that can interfere with native modules). The watch
// target rides env, not argv, so spaces/quotes in the path survive.
env: {
...process.env,
ELECTRON_RUN_AS_NODE: '1',
ORCA_FILE_WATCH_ROOT: rootPath,
ORCA_FILE_WATCH_IGNORE: RUNTIME_FILE_WATCH_IGNORE.join('\n')
},
stdio: ['ignore', 'ignore', 'pipe', 'ipc'],
...(process.platform === 'win32' ? { windowsHide: true } : {})
})
child.stderr?.setEncoding('utf8')
child.stderr?.on('data', (chunk) => {
console.error('[runtime-files.watch:child]', chunk.trimEnd())
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Headless 'orca serve' SIGABRT — uncaught Napi::Error (empty what()) tears down all agent terminals [Linux/WSL2, 1.4.97]

1 participant