Add timeouts and retries to upstream requests to mitigate connection timeouts (#163)#164
Open
Vaibhavkkm wants to merge 2 commits into
Open
Add timeouts and retries to upstream requests to mitigate connection timeouts (#163)#164Vaibhavkkm wants to merge 2 commits into
Vaibhavkkm wants to merge 2 commits into
Conversation
…timeouts (nasa#163) The service fetches apod.nasa.gov on every request with no timeout and no retries: a hung upstream connection holds a gunicorn worker indefinitely (default 30s worker timeout, sync workers), so upstream instability cascades into the timeouts and 503s reported in nasa#163. - Use a shared requests.Session with connection pooling and retries (3 retries with exponential backoff on 429/500/502/503/504) - Add explicit timeouts (5s connect, 8s read) to the APOD page fetch and the Vimeo thumbnail lookup (previously a raw PoolManager, no timeout); worst-case hang budgets stay under the 30s gunicorn worker timeout - Ignore Retry-After headers so an upstream-specified delay on 429/503 cannot sleep a worker past its timeout - Surface upstream errors that persist after retries via raise_for_status instead of parsing an error page - Add offline unit tests covering the timeout, retry config, hang budget, 404 and error paths
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR hardens outbound HTTP calls to the APOD site (and Vimeo thumbnail API) by adding explicit timeouts and retry behavior, and adds regression tests to validate those behaviors.
Changes:
- Replace direct
requests.get/urllib3.PoolManagerusage with a sharedrequests.Sessionconfigured with timeouts andRetry. - Surface persistent upstream errors via
raise_for_status()instead of parsing error pages. - Add offline unit tests covering timeouts, retry configuration, and error mapping behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/apod/test_http.py | Adds regression tests validating timeouts, retries, and upstream error surfacing. |
| apod/utility.py | Introduces shared requests.Session with retries/timeouts and updates APOD/Vimeo request paths accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,125 @@ | |||
| #!/bin/sh/python | |||
Comment on lines
+44
to
+59
| session = requests.Session() | ||
| session.mount( | ||
| "https://", | ||
| HTTPAdapter( | ||
| max_retries=Retry( | ||
| total=3, | ||
| connect=2, | ||
| read=1, | ||
| backoff_factor=0.5, | ||
| status_forcelist=[429, 500, 502, 503, 504], | ||
| allowed_methods=["GET"], | ||
| raise_on_status=False, | ||
| respect_retry_after_header=False, | ||
| ) | ||
| ), | ||
| ) |
The '#!/bin/sh/python' shebang is not a valid interpreter path and serves no purpose: these modules are imported by pytest, never executed directly. Dropped from all three test files, keeping the utf-8 encoding declaration.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #163.
The service scrapes
apod.nasa.govon every request usingrequests.get/ a rawurllib3.PoolManagerwith no timeout and no retries. When the upstream site is slow or flaky, a hung connection holds a gunicorn worker until it is killed (sync workers, default 30s timeout), so transient upstream instability cascades into the connection timeouts and 503s several users report in #163.This change hardens the two outbound calls in
apod/utility.py(the APOD page fetch and the Vimeo thumbnail lookup) so a single transient upstream failure no longer becomes a hard error for the client, while keeping the worst-case time budget under the worker timeout.Changes
requests.Sessionwith connection pooling and a mounted retry policy: 3 retries with exponential backoff on429/500/502/503/504,GETonly.5sconnect,8sread. The Vimeo lookup previously used a rawPoolManagerwith no timeout at all.Retry-Afterheaders are deliberately ignored (respect_retry_after_header=False). urllib3 honors that header by default with a 6-hour cap; a429/503carrying a longRetry-After— exactly the overloaded-upstream case this fix targets — would otherwise sleep the worker past its timeout.raise_for_status()after the existing 404 check, so an upstream error that persists through the retries surfaces as a clear error instead of the parser choking on an error page.tests/apod/test_http.py) covering the timeout argument, the retry configuration, the worst-case hang budget, and the 404 / persistent-error paths for both the APOD and Vimeo requests. They run fully offline (mocked session), so CI stays hermetic.Notes / scope
start_date/end_daterange andcountrandom) call the fetch per date in a loop with no shared deadline, so during a sustained upstream outage the per-date retry cost adds up. This is pre-existing loop behavior (those endpoints already iterated and skipped failed dates before this change); I kept the fix scoped to the per-request hardening rather than introducing a cross-request deadline mechanism. Happy to follow up separately if maintainers would like a bounded total budget for the loop endpoints.urllib3is already a direct dependency inpyproject.toml; this only addsrequests.adapters.HTTPAdapterandurllib3.util.retry.Retryimports.Testing
Also smoke-tested end-to-end against the live
apod.nasa.govsite and through the Flask app for a fixed date and today's date.