Skip to content

WebSocket reconnection issues #999

@christianms-itx

Description

@christianms-itx

WeaveStoreAzureWebPubSubSyncHost.createWebSocket() in packages/store-azure-web-pubsub/src/server/azure-web-pubsub-host.ts has two related bugs that cause degraded reconnection behavior and resource leaks over time.


Bug 1: _reconnectAttempts is never reset on successful connection

Description

_reconnectAttempts is incremented on every call to createWebSocket() (line 172) but is never reset to 0 when a connection succeeds (the open event handler at line 177).

This means the counter grows not only on actual failures but also on routine token refreshes (every 45 minutes). The reconnection backoff delay is calculated as:

const timeout = 1000 * Math.pow(1.5, this._reconnectAttempts);

Since the exponent grows indefinitely, a host that has been running for hours will have an unreasonably long delay before attempting to reconnect after an actual disconnection.

Impact

Uptime Approx. _reconnectAttempts Reconnect delay if connection drops
0h 1 1.5s
3.75h ~6 ~11s
8h ~11 ~86s
24h ~33 ~21 minutes

After extended uptime, the host effectively cannot reconnect in a timely manner.

Expected behavior

_reconnectAttempts should be reset to 0 when the WebSocket connection is successfully established (inside the open event handler).

Relevant code

https://github.com/InditexTech/weavejs/blob/main/code/packages/store-azure-web-pubsub/src/server/azure-web-pubsub-host.ts#L167-L195


Bug 2: Duplicate WebSockets created on token refresh

Description

The token refresh logic (lines 273-282) triggers two parallel reconnection paths:

setTimeout(() => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.close();                  // (A) triggers the 'close' event handler
    this.createWebSocket();      // (B) creates a new WebSocket immediately
  }
}, expirationTimeInMinutes * 0.75 * 60 * 1000);
  • Path A: ws.close() fires the close event listener (line 240), which — since _forceClose is false — schedules a new createWebSocket() call after a delay.
  • Path B: this.createWebSocket() on line 280 creates another WebSocket immediately.

This results in two WebSockets being created on every token refresh. Since each WebSocket sets up its own token refresh timer, the effect compounds geometrically:

Time Active WebSockets
0 min 1
45 min 2
90 min 4
135 min 8

Additional consequences

  • Only the last WebSocket created is stored in this._conn (line 193), so stop() only closes one — the rest become orphaned, leaking resources and processing duplicate messages.
  • Each orphaned WebSocket also increments _reconnectAttempts, accelerating the degradation described in Bug 1.

Expected behavior

The token refresh should create exactly one new WebSocket. The close event handler should not trigger a reconnection when the close was initiated by the token refresh logic.

Relevant code

https://github.com/InditexTech/weavejs/blob/main/code/packages/store-azure-web-pubsub/src/server/azure-web-pubsub-host.ts#L240-L282

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions