Coalesce concurrent session refreshes and fix refresh-path deadlock#6
Open
evanqhuang wants to merge 2 commits intoAdvik-B:masterfrom
Open
Coalesce concurrent session refreshes and fix refresh-path deadlock#6evanqhuang wants to merge 2 commits intoAdvik-B:masterfrom
evanqhuang wants to merge 2 commits intoAdvik-B:masterfrom
Conversation
Three interacting concurrency defects made Scraper unsafe to share across goroutines, contrary to its documented use: * handle403 used a single in403Retry bool to elect one retrying goroutine; all other concurrent 403 callers immediately returned ErrMaxRetriesExceeded without retrying, so a single Cloudflare blip turned into N-1 spurious failures and a misleading error. * The scheduled-refresh block in do() held s.mu across refreshSession's network probe, which re-entered s.Get -> s.do -> s.mu.Lock(). sync.Mutex is not reentrant, so the first automatic refresh (default: one hour after the first request) deadlocked the scraper. * handle403 invoked refreshSession without holding s.mu while refreshSession reassigned s.UserAgent and mutated cipher suites; concurrent callers in do() read s.UserAgent.Headers with no lock, racing the writes. Fix: * Replace the in403Retry bool with a singleflight.Group keyed on "session-refresh". Concurrent refresh requests collapse into one upstream probe, and a failed refresh propagates the same error to every coalesced caller instead of triggering a blind retry against stale state. * Split do() into do(req) + doWithRefresh(req, allowRefresh). The scheduled refresh check now takes s.mu only to snapshot sessionStartTime, then releases it before any I/O. refreshSession's root-URL probe uses doWithRefresh(req, false), breaking the do -> refreshSession -> do reentrancy that caused the deadlock and would otherwise cause singleflight.Do to wait on itself. * Guard UserAgent reads with s.mu at request time; guard the whole refresh state transition (UserAgent, sessionStartTime, cipher suites, cookie jar) under s.mu in refreshSession so doWithRefresh's snapshot is consistent. * Treat a 403 from the refresh probe as a failed refresh so callers see a meaningful error rather than silently continuing with an unchanged session. * submitChallengeForm now uses doWithRefresh(req, false): a failed challenge submission is not a normal 403 and must not recursively enter the singleflight group that the refresh probe may already hold.
Covers the regressions fixed in the previous commit: * TestHandle403_CoalescesConcurrentRefreshes asserts that N concurrent 403s produce O(1) refresh probes (not N) and that every caller succeeds, not N-1 spurious ErrMaxRetriesExceeded failures. * TestAutoRefresh_NoDeadlock forces a refresh on every request with a millisecond SessionRefreshInterval. Old code deadlocks on first refresh; new code completes within the 10s test deadline. * TestRefreshFailure_PropagatesToAllCallers verifies that if the upstream is still 403 after refresh, all coalesced callers receive the same refresh-failure error rather than racing on stale state. * TestConcurrentReadsUnderRefresh_NoDataRace runs 50 goroutines x 3 iterations while forcing repeated refreshes; under -race, any unguarded read of Scraper.UserAgent or the transport cipher suites trips the detector. Tests disable stealth mode to avoid its default 500ms-2s human-like delay between requests, which would make the concurrency assertions impractical. Run with `go test -race -count=5 ./lib/` to shake out scheduler-induced flakes.
ced7d19 to
89573c0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Scraperis documented as safe to share across goroutines, but three interacting concurrency defects inlib/cloudscraper.gobreak that guarantee:handle403. Thein403Retrybool elects one retrying goroutine; the other N−1 concurrent 403 callers returnErrMaxRetriesExceededwithout retrying. A single Cloudflare blip turns into N−1 spurious failures, and the error is a lie — those callers never exceeded any retry budget.do()holdss.muwhile callingrefreshSession, which callss.Get→s.do→s.mu.Lock().sync.Mutexis not reentrant, so the first automatic refresh (default: one hour after first request) deadlocks the scraper.UserAgent/ cipher suites.handle403's refresh path releasess.mubefore callingrefreshSession, which reassignss.UserAgentand mutates transport cipher suites; concurrentdo()calls reads.UserAgent.Headerswith no lock.go test -raceflags this once tests exist.Fix
golang.org/x/sync/singleflightto coalesce refreshes under a single"session-refresh"key. Concurrent refresh requests collapse into one upstream probe, and a failed refresh propagates the same error to every coalesced caller — no blind retry against stale state.dointodo+doWithRefresh(req, allowRefresh).refreshSession's root-URL probe calls it withallowRefresh=false, breaking thedo → refreshSession → doreentrancy that both caused the deadlock and would otherwise makesingleflight.Dowait on itself.submitChallengeFormalso usesallowRefresh=false: a failed challenge-form POST must not recursively enter the refresh group.s.munow guards only state transitions —UserAgentswap,sessionStartTime, cipher suites, cookie jar — and is released before any I/O.doWithRefreshsnapshotss.UserAgentunders.mubefore using it.Test plan
New file
lib/cloudscraper_concurrency_test.go— the repo previously had zero tests. All four run undergo test -race:TestHandle403_CoalescesConcurrentRefreshes— 20 concurrent 403 callers produce ≤ 3 refresh probes (ideally 1) and all succeed.TestAutoRefresh_NoDeadlock— 1msSessionRefreshInterval+ 20 concurrent callers complete within 10s. Old code hangs.TestRefreshFailure_PropagatesToAllCallers— upstream permanently 403 → every coalesced caller receives a refresh-failure error.TestConcurrentReadsUnderRefresh_NoDataRace— 50 goroutines × 3 iterations while forcing repeated refreshes;-racedetects any unguardedUserAgentread.Verification:
go build ./... go vet ./... go test -race -count=5 ./...All pass locally, 5 iterations clean.
Dependency
Adds
golang.org/x/sync v0.15.0as a direct dependency. Stdlib-adjacent module, single package used (singleflight). Rolling our own coalescer viasync.Condis more code and more bug surface for no benefit.