fix(http_utils): disable httpx keepalive to spread load across uvicorn workers#29
Open
rmfan wants to merge 1 commit into
Open
fix(http_utils): disable httpx keepalive to spread load across uvicorn workers#29rmfan wants to merge 1 commit into
rmfan wants to merge 1 commit into
Conversation
…n workers A pooled httpx.AsyncClient against a uvicorn --workers N server pins all requests to the small subset of workers that accept()-won the pooled TCP connections (uvicorn shares one listen socket across workers; no SO_REUSEPORT, no work-stealing). Observed in a harbor_server run: n_workers_active = 2 of 32 for most minutes, with those 2 workers saturated at their per-process Semaphore cap while the other 30 sat idle. Setting max_keepalive_connections=0 closes the TCP after each response, so every /run gets its own accept() race and load spreads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
init_http_clientbuilds a process-widehttpx.AsyncClientsingleton with HTTP/1.1 keepalive at default. When that client targets a uvicorn--workers Nserver, all/runtraffic gets pinned to the small subset of workers that originally accept()-won the pooled TCP connections — because:uvicorn/config.py: bind_socketsetsSO_REUSEADDRonly, notSO_REUSEPORT) and shares its fd with all worker children.accept()against the shared listen queue. Dispatch is per-TCP-connection, not per-request. Once a connection lands on worker N, every HTTP/1.1 keepalive request on that connection stays on worker N for the connection's lifetime.max_keepalive_connections=0closes the TCP after each response, so every/runruns its ownaccept()race and load actually spreads across workers.Observed impact (harbor_server, RL360 slurm_job 1694138, 2026-05-29)
Per-minute distinct workers ever calling
_run_inflight += 1:i.e. 75% of minutes used ≤3 of the 32 workers. Single-worker peak
inflight_after_acquire=32(the per-workerSemaphore(max_concurrent=32)cap) showed up againstn_workers_active=2— meaning the cluster's effective ceiling was 2 × 32 = 64 trials, not the nominal 32 × 32 = 1024. The other 30 workers sat idle.Source instrumentation:
harbor_server.py:49(module-level_run_inflight),harbor_server.py:1107-1113(acquire + counter),log_format.py:54,147(per-recordpid).Risk
_http_client.post(...)interface.max_connections=_client_concurrencyis unchanged, so the high-water concurrency cap is the same._http_clientsingleton. The Ray-distributed_HttpPosterActorpath (http_utils.py:265-266) has the same pattern and likely the same issue — left out of this PR because (a) the user request was specifically the main client and (b) it's gated behinduse_distributed_post. Worth a follow-up if the deployment uses it.Test plan
~/scripts/athena_harbor_samples.py— expectn_workers_activep50 to climb from 3 toward 32,wait_secsp99 to drop substantiallyss -tn dport = :<harbor_port>on the caller during a hot minute should show short-lived rather than long-lived connections🤖 Generated with Claude Code