Skip to content

fix(acp): replace turn_interrupted boolean with monotonic turn counter#424

Open
theahura wants to merge 7 commits intomainfrom
fix/stale-completed-turn-counter
Open

fix(acp): replace turn_interrupted boolean with monotonic turn counter#424
theahura wants to merge 7 commits intomainfrom
fix/stale-completed-turn-counter

Conversation

@theahura
Copy link
Copy Markdown
Contributor

@theahura theahura commented Apr 6, 2026

Summary

🤖 Generated with Nori

  • Replace turn_interrupted: Arc<AtomicBool> with turn_id: Arc<AtomicU64> monotonic turn counter to eliminate TOCTOU race that caused agent responses to be hidden after interrupt
  • Guard ALL stale-task tail events (ErrorEvent + Completed + idle timer) with a single turn_id match check, not just Completed
  • Remove TUI's pending_stale_completes counter — no longer needed since the ACP backend now reliably suppresses all stale events

This is the third attempt at fixing this interrupt race condition (after PRs #416 and #418). The previous boolean+counter architecture was fundamentally flawed — each fix introduced a new bug. The monotonic counter eliminates the race entirely because it only increments and is never reset.

Test Plan

  • Unit tests verify TUI handles interrupt-then-new-turn correctly (part7.rs)
  • All 1193 TUI tests pass (1192 + 1 ignored)
  • All 439 ACP tests pass
  • Binary builds and starts cleanly
  • just fix and just fmt clean
  • Self-review addressed idle timer bug (moved inside turn_id guard)

Share Nori with your team: https://www.npmjs.com/package/nori-skillsets

theahura and others added 4 commits April 5, 2026 21:30
The previous approach used a shared AtomicBool flag to suppress stale
TurnLifecycle::Completed events from cancelled tasks. This had an
unavoidable TOCTOU race: when a user interrupted and quickly sent a new
message, the flag was reset for the new turn before the old task could
check it, allowing stale events to interfere with subsequent turns.

Two prior attempts (PRs #416, #418) tried to fix this with variations
of the same boolean+counter architecture. Each fix introduced a new bug:
#416 caused hangs, #418 brought back hidden responses.

This commit replaces the architecture entirely:
- AtomicBool → AtomicU64 monotonic turn counter in ACP backend
- Each spawned task captures its own turn ID at spawn time
- ALL tail events (ErrorEvent + Completed + idle timer) are guarded by
  a single turn_id match check
- TUI's pending_stale_completes counter is removed (no longer needed)

The monotonic counter eliminates the race because it only increments—
there is no "reset" that a stale task can observe.
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <[email protected]>
…ate channel

When a cancelled prompt returns AFTER a new prompt has installed its
update sender in the shared active_update_tx slot, the old prompt's
unconditional uninstall (setting to None) wipes the new prompt's
channel. This causes the new prompt's response events to be routed
to the persistent channel instead of the update handler, making the
response invisible in the TUI.

Fix: add a generation counter to the active_update_tx slot. Each
prompt() and load_session() call gets a unique generation and only
uninstalls if its generation still matches the slot. This prevents
stale prompts from interfering with newer ones.
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <[email protected]>
@theahura theahura force-pushed the fix/stale-completed-turn-counter branch from 0858208 to 9056f1c Compare April 6, 2026 02:27
theahura and others added 3 commits April 6, 2026 00:20
Add integration test verifying sequential prompt responses work after
the generation counter fix. Remove temporary TURN_DEBUG tracing.
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <[email protected]>
…te notifications

block_task() can return before all SessionNotification events are
delivered. The previous approach of calling close_update_channel()
immediately after prompt() would kill the channel too early, causing
late notifications to be lost. This was especially problematic in the
interrupt+re-prompt flow where the first prompt's close could wipe the
second prompt's channel.

Replace close_update_channel() calls from prompt callers with a
done_tx/done_rx oneshot signal. After prompt() returns, the done
signal fires and the update_handler drains remaining events with a
500ms timeout before exiting. The active_update_tx slot is no longer
cleared between turns — the next prompt() overwrites it.

Also fix the notification handler to fall through to persistent_tx
when try_send fails (closed or full channel), preventing inter-turn
notifications from being silently dropped.

🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <[email protected]>
🤖 Generated with [Nori](https://noriagentic.com)

Co-Authored-By: Nori <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant