fix: treat Scrypt WS Connection closed as transient#3655
Open
TaprootFreak wants to merge 2 commits intodevelopfrom
Open
fix: treat Scrypt WS Connection closed as transient#3655TaprootFreak wants to merge 2 commits intodevelopfrom
TaprootFreak wants to merge 2 commits intodevelopfrom
Conversation
The Scrypt WebSocket adapter rejects all pending requests with 'Connection closed' when the WS disconnects. Previously this surfaced as a permanent OrderFailedException in the liquidity management pipeline, causing the rule to be paused even though the underlying order on Scrypt was still alive (no fill, no money moved). Two changes: - ScryptWebSocketConnection: extend the 'fetchAll' retry pattern to also cover 'fetch'. Refactor the retry logic into a shared helper so any future fetch-style call gets the same treatment. - ScryptAdapter.checkTradeCompletion: when the underlying error is a transient WS error (Connection closed / unknown reqid), return false instead of throwing OrderFailedException, so the order stays IN_PROGRESS and is retried on the next cron tick. Reproduced via pipeline 60738 (rule 313, Scrypt/EUR redundancy): order 122805 went through 5 ClOrdIds in 3 minutes before WS dropped during a check, was wrongly marked Failed; balance audit confirmed the EUR were never spent.
…pter
Centralize the transient WS error markers ('Connection closed' /
'unknown reqid') and a shared isTransientWsError helper in
scrypt-websocket-connection. Both retryOnTransientWsError and the
ScryptAdapter check now use the same function, eliminating the
duplicated string list and aligning the case-insensitive matching with
isBalanceTooLowError elsewhere in the adapter.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes liquidity-management pipelines being permanently FAILED on transient Scrypt WebSocket disconnects.
When the Scrypt WS drops, all pending requests are rejected with
new Error('Connection closed'). This was surfaced asOrderFailedExceptioninScryptAdapter.checkTradeCompletion, marking the order Failed → action 233 has noonFail→ pipeline FAILED → rule auto-paused → mail. Meanwhile the underlying order on Scrypt is unaffected and the funds are not moved.Changes
scrypt-websocket-connection.ts: Extract the retry logic added in Various improvements #3594 into a private helperretryOnTransientWsErrorand apply it tofetch(was previously only onfetchAll).fetchis used byfetchExecutionReportsandfetchOrderBook, both on the hot path ofcheckTrade.scrypt.adapter.ts: IncheckTradeCompletion, classifyConnection closed/unknown reqidas transient → returnfalseso the order staysIN_PROGRESSand is retried on the next 10s cron tick. Genuine errors still throwOrderFailedException.Repro / data point
Connection closed→ wrongly marked FailedTest plan
IN_PROGRESSand resumes (vs. flipping toFailed)getOrderStatuscache + 30-day fallback inscrypt.service.ts:301already dedupes by ClOrdID, so retry-on-next-tick reuses the existing correlationRetrying fetch ... after transient errorandTransient WS error checking orderto gauge frequency