0.5.2: dedup stale + dual-dial sessions in addPeer / inbound-connection#21
Merged
Conversation
When two peers Bonjour-discover each other near-simultaneously, both
processes can dial outbound. The previous logic in `inbound-connection`
and `_createPeer` short-circuited the moment a same-source transport
key was present in the peer's transports map, regardless of whether
that prior was actually alive or what direction it was in. Three real
failure modes:
(1) Stale prior — the previous transport's `_closed` flag is set
but its close handler hasn't fired yet (Apple Network framework
doesn't always deliver FIN promptly when a peer process exits
abruptly, particularly for same-host LAN connections). Any
reconnect attempt was permanently rejected until the OS reaped
the dead entry.
(2) Same-direction duplicate — listener fires `newConnectionHandler`
twice for the same advertised service (TCP retry, multipath race).
Replacing the established healthy inbound with the duplicate
tears down its wire pair on the remote side and triggers
peer-left storms.
(3) Dual-dial collision — both peers dialed and both inbounds /
outbounds completed. The unconditional reject killed one side's
view of the connection, leaving asymmetric peer state.
Observed in the field on macOS: Mac MeloMove (Catalyst, sym-swift)
and claude-code-mac (Node, this SDK) on the same Mac would never
maintain a peer relationship — claude-code-mac silently rejected
Mac MeloMove's inbound dial because of a stale entry in `_peers`.
iPhone-to-claude-code-mac across the LAN worked fine because the
timing windows differ.
Fix: in both `inbound-connection` handler and `_createPeer`:
- Detect stale prior (`_closed=true`) and treat as no prior.
- For same-direction duplicate: keep prior, drop new (no wire-pair
teardown on the remote).
- For dual-dial: nodeId-based deterministic tie-break (lower nodeId
acts as client, keeps outbound; higher keeps inbound). Both peers
independently compute the same physical-socket winner without
exchanging coordination frames. Mirrors @sym-bot/sym-swift v0.3.80.
150/150 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three failure modes collapsed into the same bug: the prior dedup logic in
inbound-connectionand_createPeershort-circuited the moment a same-source transport key was present inpeer.transports, regardless of whether that prior was actually alive or what direction it was in.Stale prior — the previous transport's
_closedflag is set but its close handler hasn't fired yet. Apple's Network framework doesn't always deliver FIN promptly when a peer process exits abruptly (especially on same-host LAN connections). Any reconnect attempt was permanently rejected until the OS reaped the dead entry.Same-direction duplicate — listener fires
newConnectionHandlertwice for the same advertised service (TCP retry, multipath race, repeated Bonjour resolution). Silently replacing the established healthy inbound with the duplicate tears down the wire pair on the remote side → peer-left storm.Dual-dial collision — both peers Bonjour-discover each other within ~50ms and both initiate outbound TCP. The unconditional reject killed one side's view of the connection, leaving asymmetric peer state.
Field evidence
Observed on macOS: Mac MeloMove (Catalyst, sym-swift v0.3.80) and claude-code-mac (Node, this SDK) on the same Mac would never maintain a peer relationship. claude-code-mac silently rejected Mac MeloMove's inbound dial via the
transport.close(); return;short-circuit atlib/node.js:512(pre-fix). iPhone↔claude-code-mac across the LAN worked because the timing windows differ — claude-code-mac's outbound to iPhone landed first as iPhone's inbound, no dedup conflict.Mac MeloMove log:
Fix
In both
inbound-connectionhandler and_createPeer:_closed=true) and treat as no prior.This mirrors the @sym-bot/sym-swift v0.3.80 fix that landed earlier today, so cross-runtime peers now agree on the same dedup convention.
Tests
150/150 existing unit tests pass — no regressions in transport priority, peer lifecycle, multi-transport, or any other path.
Test plan
npm test)🤖 Generated with Claude Code