Skip to content

0.5.2: dedup stale + dual-dial sessions in addPeer / inbound-connection#21

Merged
sym-bot merged 4 commits into
mainfrom
fix/dedup-stale-and-dual-dial
Apr 29, 2026
Merged

0.5.2: dedup stale + dual-dial sessions in addPeer / inbound-connection#21
sym-bot merged 4 commits into
mainfrom
fix/dedup-stale-and-dual-dial

Conversation

@sym-bot
Copy link
Copy Markdown
Owner

@sym-bot sym-bot commented Apr 29, 2026

Summary

Three failure modes collapsed into the same bug: the prior dedup logic in inbound-connection and _createPeer short-circuited the moment a same-source transport key was present in peer.transports, regardless of whether that prior was actually alive or what direction it was in.

  1. Stale prior — the previous transport's _closed flag is set but its close handler hasn't fired yet. Apple's Network framework doesn't always deliver FIN promptly when a peer process exits abruptly (especially on same-host LAN connections). Any reconnect attempt was permanently rejected until the OS reaped the dead entry.

  2. Same-direction duplicate — listener fires newConnectionHandler twice for the same advertised service (TCP retry, multipath race, repeated Bonjour resolution). Silently replacing the established healthy inbound with the duplicate tears down the wire pair on the remote side → peer-left storm.

  3. Dual-dial collision — both peers Bonjour-discover each other within ~50ms and both initiate outbound TCP. The unconditional reject killed one side's view of the connection, leaving asymmetric peer state.

Field evidence

Observed on macOS: Mac MeloMove (Catalyst, sym-swift v0.3.80) and claude-code-mac (Node, this SDK) on the same Mac would never maintain a peer relationship. claude-code-mac silently rejected Mac MeloMove's inbound dial via the transport.close(); return; short-circuit at lib/node.js:512 (pre-fix). iPhone↔claude-code-mac across the LAN worked because the timing windows differ — claude-code-mac's outbound to iPhone landed first as iPhone's inbound, no dedup conflict.

Mac MeloMove log:

[SYM] session: connection ready (outbound=true)
[SYM] session: disconnected: Connection closed     ← claude-code-mac closed it
[C1 019d59a1-...] is already cancelled, ignoring cancel

Fix

In both inbound-connection handler and _createPeer:

  • Detect stale prior (_closed=true) and treat as no prior.
  • For same-direction duplicate: keep prior, drop new (no wire-pair teardown on the remote).
  • For dual-dial: nodeId-based deterministic tie-break (lower nodeId acts as client, keeps outbound; higher keeps inbound). Both peers independently compute the same physical-socket winner without exchanging coordination frames.

This mirrors the @sym-bot/sym-swift v0.3.80 fix that landed earlier today, so cross-runtime peers now agree on the same dedup convention.

Tests

150/150 existing unit tests pass — no regressions in transport priority, peer lifecycle, multi-transport, or any other path.

Test plan

  • All existing unit tests pass (npm test)
  • Verify on Mac MeloMove ↔ claude-code-mac (same host) — connection establishes and stays connected; claude-code-mac shows up in Mac MeloMove's PEERS list.

🤖 Generated with Claude Code

sym-bot and others added 4 commits April 29, 2026 15:44
When two peers Bonjour-discover each other near-simultaneously, both
processes can dial outbound. The previous logic in `inbound-connection`
and `_createPeer` short-circuited the moment a same-source transport
key was present in the peer's transports map, regardless of whether
that prior was actually alive or what direction it was in. Three real
failure modes:

  (1) Stale prior — the previous transport's `_closed` flag is set
      but its close handler hasn't fired yet (Apple Network framework
      doesn't always deliver FIN promptly when a peer process exits
      abruptly, particularly for same-host LAN connections). Any
      reconnect attempt was permanently rejected until the OS reaped
      the dead entry.

  (2) Same-direction duplicate — listener fires `newConnectionHandler`
      twice for the same advertised service (TCP retry, multipath race).
      Replacing the established healthy inbound with the duplicate
      tears down its wire pair on the remote side and triggers
      peer-left storms.

  (3) Dual-dial collision — both peers dialed and both inbounds /
      outbounds completed. The unconditional reject killed one side's
      view of the connection, leaving asymmetric peer state.

Observed in the field on macOS: Mac MeloMove (Catalyst, sym-swift)
and claude-code-mac (Node, this SDK) on the same Mac would never
maintain a peer relationship — claude-code-mac silently rejected
Mac MeloMove's inbound dial because of a stale entry in `_peers`.
iPhone-to-claude-code-mac across the LAN worked fine because the
timing windows differ.

Fix: in both `inbound-connection` handler and `_createPeer`:

  - Detect stale prior (`_closed=true`) and treat as no prior.
  - For same-direction duplicate: keep prior, drop new (no wire-pair
    teardown on the remote).
  - For dual-dial: nodeId-based deterministic tie-break (lower nodeId
    acts as client, keeps outbound; higher keeps inbound). Both peers
    independently compute the same physical-socket winner without
    exchanging coordination frames. Mirrors @sym-bot/sym-swift v0.3.80.

150/150 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@sym-bot sym-bot merged commit 001af21 into main Apr 29, 2026
2 checks passed
@sym-bot sym-bot deleted the fix/dedup-stale-and-dual-dial branch April 29, 2026 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant