Skip to content

perf: remove recv_timeout and use explicit Stop#168

Open
lakshya-sky wants to merge 3 commits into
lambdaclass:mainfrom
lakshya-sky:perf/polling-loop-with-poison-pill
Open

perf: remove recv_timeout and use explicit Stop#168
lakshya-sky wants to merge 3 commits into
lambdaclass:mainfrom
lakshya-sky:perf/polling-loop-with-poison-pill

Conversation

@lakshya-sky
Copy link
Copy Markdown
Contributor

Close: #157

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR replaces the 100 ms recv_timeout polling loop in the threads actor with a blocking recv() call paired with an explicit MailboxItem::Shutdown sentinel, eliminating the stop-latency regression tracked in #157. ChildHandle::from_tasks and from_threads are also refactored to accept a pre-built CancelFn closure instead of a raw CancellationToken, allowing each backend to compose the right wakeup logic at the call site.

  • Threads actor gains a MailboxItem<A> enum (Message / Shutdown); Context::stop() and the ChildHandle cancel closure now both cancel the token and enqueue a Shutdown to unblock the waiting rx.recv().
  • Tasks actor is left unchanged in its main loop (its future::select over cancellation_token.cancelled() already wakes in O(1)); only the ChildHandle conversion is updated to wrap the cancel call in a closure.

Confidence Score: 4/5

Safe to merge — the behavioral changes are confined to the threads actor stop path and the core loop invariants are preserved.

The logic is sound: cancellation token + Shutdown message correctly unblocks the blocking recv() call in every relevant scenario. No data races or message-ordering regressions introduced.

concurrency/src/threads/actor.rs deserves a second read around the interaction between the cancellation-token check after each handler and the queued Shutdown sentinel, but it holds up on close inspection.

Important Files Changed

Filename Overview
concurrency/src/threads/actor.rs Core change: recv_timeout polling loop replaced with blocking recv() + MailboxItem::Shutdown. Logic is sound; one minor unnecessary clone in Context::stop and the stop-from-within-a-handler path leaves an unconsumed Shutdown in the channel (harmless but worth documenting).
concurrency/src/tasks/actor.rs Only the From<ActorRef> for ChildHandle impl changed: cancel token call is now wrapped in a closure. Tasks main loop is untouched; no correctness concerns.
concurrency/src/child_handle.rs Signature change: from_tasks and from_threads now accept a CancelFn instead of a bare CancellationToken. Test helpers correctly build the Arc<dyn Fn()> closure. Clean refactor.

Comments Outside Diff (1)

  1. concurrency/src/threads/actor.rs, line 583-609 (link)

    P2 Shutdown left unconsumed when stop is called from within a handler

    When ctx.stop() is called inside a message handler, Shutdown is enqueued but never consumed: the loop exits via the if cancellation_token.is_cancelled() check that fires after the handler returns, before rx.recv() can see the sentinel. The Shutdown is silently dropped when rx goes out of scope, so there is no correctness problem. However, if stop() is called N times (e.g., idempotent shutdown logic or supervisor retries), N Shutdown items accumulate in the channel for the duration of the last handler. A brief doc-comment on stop() noting this is worth adding for future maintainers.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: concurrency/src/threads/actor.rs
    Line: 583-609
    
    Comment:
    **`Shutdown` left unconsumed when stop is called from within a handler**
    
    When `ctx.stop()` is called inside a message handler, `Shutdown` is enqueued but never consumed: the loop exits via the `if cancellation_token.is_cancelled()` check that fires after the handler returns, before `rx.recv()` can see the sentinel. The `Shutdown` is silently dropped when `rx` goes out of scope, so there is no correctness problem. However, if `stop()` is called N times (e.g., idempotent shutdown logic or supervisor retries), N `Shutdown` items accumulate in the channel for the duration of the last handler. A brief doc-comment on `stop()` noting this is worth adding for future maintainers.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
concurrency/src/threads/actor.rs:138-141
`Context::stop` clones the cancellation token via `cancellation_token()` only to immediately call `cancel()` on the clone and drop it. The clone is unnecessary — `cancel()` propagates through any clone because they share the same underlying atomic state, but the direct field is already in scope.

```suggestion
    pub fn stop(&self) {
        self.cancellation_token.cancel();
        let _ = self.sender.send(MailboxItem::Shutdown);
    }
```

### Issue 2 of 2
concurrency/src/threads/actor.rs:583-609
**`Shutdown` left unconsumed when stop is called from within a handler**

When `ctx.stop()` is called inside a message handler, `Shutdown` is enqueued but never consumed: the loop exits via the `if cancellation_token.is_cancelled()` check that fires after the handler returns, before `rx.recv()` can see the sentinel. The `Shutdown` is silently dropped when `rx` goes out of scope, so there is no correctness problem. However, if `stop()` is called N times (e.g., idempotent shutdown logic or supervisor retries), N `Shutdown` items accumulate in the channel for the duration of the last handler. A brief doc-comment on `stop()` noting this is worth adding for future maintainers.

Reviews (1): Last reviewed commit: "update child handles" | Re-trigger Greptile

Comment on lines 138 to 141
pub fn stop(&self) {
self.cancellation_token.cancel();
self.cancellation_token().cancel();
let _ = self.sender.send(MailboxItem::Shutdown);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Context::stop clones the cancellation token via cancellation_token() only to immediately call cancel() on the clone and drop it. The clone is unnecessary — cancel() propagates through any clone because they share the same underlying atomic state, but the direct field is already in scope.

Suggested change
pub fn stop(&self) {
self.cancellation_token.cancel();
self.cancellation_token().cancel();
let _ = self.sender.send(MailboxItem::Shutdown);
}
pub fn stop(&self) {
self.cancellation_token.cancel();
let _ = self.sender.send(MailboxItem::Shutdown);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/threads/actor.rs
Line: 138-141

Comment:
`Context::stop` clones the cancellation token via `cancellation_token()` only to immediately call `cancel()` on the clone and drop it. The clone is unnecessary — `cancel()` propagates through any clone because they share the same underlying atomic state, but the direct field is already in scope.

```suggestion
    pub fn stop(&self) {
        self.cancellation_token.cancel();
        let _ = self.sender.send(MailboxItem::Shutdown);
    }
```

How can I resolve this? If you propose a fix, please make it concise.

@ElFantasma
Copy link
Copy Markdown
Collaborator

ElFantasma commented May 22, 2026

Nice fix — eliminates the 100ms polling overhead and shutdown latency cleanly. The poison-pill pattern is the right call given std::sync::mpsc::Receiver doesn't have a multi-source wait primitive.

Suggestion: use on_cancel callback to remove the hidden coupling

The current design requires every cancel path to also enqueue Shutdownctx.stop() does it, and the From<ActorRef> cancel closure does it. But anyone who calls cancellation_token.cancel() directly (it's pub(crate)) will leave the actor stuck on recv(). Future contributors won't know about this invariant.

The threads-mode CancellationToken already supports on_cancel(callback) — we can register the poison-pill once at spawn time and make it automatic:

// In ActorRef::spawn()
let sender_clone = tx.clone();
cancellation_token.on_cancel(Box::new(move || {
    let _ = sender_clone.send(MailboxItem::Shutdown);
}));

With this:

  • ctx.stop() stays a one-liner (self.cancellation_token.cancel()) — matches tasks mode
  • From<ActorRef> cancel closure also stays simple
  • Any path that cancels the token automatically wakes the actor — no footgun

Other notes

  • Tests: nothing currently asserts the fast shutdown response. Worth a regression test that verifies shutdown latency is small (e.g., <10ms) so this doesn't silently regress if someone later adds a cancel path that bypasses the wake-up.
  • ctx.stop() during started() enqueues a Shutdown that's never consumed (the pre-loop if cancellation_token.is_cancelled() branch exits before reading the mailbox). Harmless — the message is dropped when the receiver is dropped on actor exit — but worth a comment.
  • Rebase against feat: bidirectional links, trap_exit, and start_linked for actors #166 (links + trap_exit) will be non-trivial once it lands: ChildHandle::from_tasks/from_threads gain several arguments (trap_exit, links, linked_reason, send_exit), and From<ActorRef<A>> for ChildHandle needs to build a SendExitFn. Easier to rebase after feat: bidirectional links, trap_exit, and start_linked for actors #166 settles than to do both in parallel.

Follow-up

Looking at this PR alongside #166, we have two ad-hoc patterns for non-Handler<M> mailbox traffic:

Filed as #169 — proposal to promote MailboxItem<A> into a formal internal abstraction used in both modes, with explicit variants for Message/Exit/Shutdown. That would:

  • Unify mode internals (same loop structure, same mailbox shape)
  • Replace ExitEnvelope with an explicit MailboxItem::Exit(Exit) variant
  • Make adding future system signals (timers, info messages) a structured addition rather than another ad-hoc mechanism

That refactor should land after this PR and #166 — credited you in #169 since this PR is what surfaced the unification opportunity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: replace threads 100ms polling loop with poison-pill shutdown

2 participants