Commit c53bfe6
Re-arm shell ZMQStream read after out-of-band reply send
On Windows, ipykernel 7 intermittently drops an execute_request on the
shell channel: the kernel goes idle and never replies, and the client
times out waiting for execute_reply (~30% of headless notebook runs in
our measurements; which cell hangs wanders run to run).
Root cause: the shell ROUTER socket is dual-use on the shell-channel
thread. A ZMQStream reads execute_requests off it, while replies are
sent back over the SAME socket out-of-band via a raw send_multipart in
SubshellManager._send_on_shell_channel. That out-of-band send drains the
socket's edge-triggered ZMQ_FD read edge (a documented libzmq corollary:
after zmq_send the socket may become readable without a new edge). The
send is not ZMQStream-mediated, so the stream is never re-armed and a
request that arrived concurrently strands unread on a registered-but-
non-readable fd. The strand is terminal: no later arrival re-edges it.
Fix: after each out-of-band reply send, schedule the shell ZMQStream's
read handler on the shell-channel loop -- the same edge-trap reschedule
ZMQStream._update_handler already runs internally
(add_callback(lambda: stream._handle_events(stream.socket, 0))) -- so
the concurrently-arrived request cannot strand. The shell_stream (built
in kernelapp.init_kernel) is threaded through ShellChannelThread into
SubshellManager so the reply path can reach it.
Validated on Windows (Python 3.13/3.14, pyzmq 27.1.0 / libzmq 4.3.5):
the wedge went from 6/20 (control) to 0/20 with this patch applied, same
machine/session, P(0/20 | p=0.30) ~ 8e-4, with the threaded reference
live on every send (551 re-arms, 0 None/mismatch). A sham arm with the
same scheduling overhead but no re-arm stayed at the control rate.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>1 parent 821f6c0 commit c53bfe6
3 files changed
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
608 | 608 | | |
609 | 609 | | |
610 | 610 | | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
611 | 614 | | |
612 | 615 | | |
613 | 616 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
32 | 36 | | |
33 | 37 | | |
34 | 38 | | |
| |||
43 | 47 | | |
44 | 48 | | |
45 | 49 | | |
| 50 | + | |
46 | 51 | | |
47 | 52 | | |
48 | 53 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
50 | 55 | | |
51 | 56 | | |
52 | 57 | | |
| |||
226 | 231 | | |
227 | 232 | | |
228 | 233 | | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
229 | 242 | | |
230 | 243 | | |
231 | 244 | | |
| |||
0 commit comments