Skip to content

Prerelease: 3 verified fixes — 500s, 401 retry, Firestore cache (#5441 #5442 #5443)#5459

Merged
beastoin merged 34 commits intomainfrom
prerelease/5441-5442-5443
Mar 9, 2026
Merged

Prerelease: 3 verified fixes — 500s, 401 retry, Firestore cache (#5441 #5442 #5443)#5459
beastoin merged 34 commits intomainfrom
prerelease/5441-5442-5443

Conversation

@beastoin
Copy link
Collaborator

@beastoin beastoin commented Mar 8, 2026

Summary

Prerelease branch combining 3 independently verified sub-PRs:

Verification

All 3 PRs passed comprehensive verification in PR #5445 by @Kelvin:

Deploy Request

Please trigger backend deploy after merge — these fixes address active 500 errors and cost optimization. Deploy via "Deploy Backend to Cloud RUN" workflow (covers Cloud Run + GKE backend-listen rollout restart).

Merge Strategy

Each sub-PR was merged with --no-ff to preserve individual commit history.

beastoin and others added 30 commits March 8, 2026 02:38
Legacy Firestore person documents may lack these fields, causing
ResponseValidationError (500) on /v1/users/people for 8 users.
Make both fields Optional with None default.

Fixes part of #5423

Co-Authored-By: Claude Opus 4.6 <[email protected]>
get_people(), get_person(), get_person_by_name(), get_people_by_ids()
all returned raw to_dict() without the document ID. Legacy docs
missing the 'id' field caused ResponseValidationError on Person model.

Fixes #5423

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Enables the list endpoint to use this lighter function that skips
loading full base64 photo content per conversation.

Fixes part of #5424

Co-Authored-By: Claude Opus 4.6 <[email protected]>
GET /v1/conversations was loading full base64 photos for every
conversation via @with_photos decorator. 50 convos x 1.2MB = 57MB
exceeded Cloud Run 32MB response limit. The list endpoint doesn't
need photo content — individual conversation GET still loads them.

Fixes #5424

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ingApiCall

Port the 401→refresh→retry→signout pattern from makeApiCall() into both
multipart methods. Extract _buildMultipartRequest() helper to rebuild
requests for retry (streams are single-use).

Fixes #5414
Reviewer feedback: where("id", "in", ...) misses legacy docs that
don't have a stored 'id' field. Use db.get_all() with doc refs instead.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
10 tests covering:
- Person model resilience with missing created_at/updated_at (#5423)
- Doc ID injection in get_people, get_person, get_people_by_ids (#5423)
- Conversations list endpoint uses without-photos function (#5424)
- get_conversations_without_photos supports folder_id/starred (#5424)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Large batch test for get_people_by_ids (>30 IDs, old where-in limit)
- Empty list boundary test
- Verify get_conversations_without_photos lacks @with_photos decorator
- Verify get_conversations retains @with_photos for individual use

Co-Authored-By: Claude Opus 4.6 <[email protected]>
7 tests covering all code paths:
- Non-401 returns directly (no refresh, no signout)
- 401 → refresh succeeds → retry succeeds (200)
- 401 → refresh succeeds → retry still 401 → signs out
- 401 → refresh fails (empty token) → signs out without retry
- requireAuthCheck=false skips 401 handling
- Request rebuilt with fresh headers for retry
- 500 does not trigger auth retry
… retry test

Adds STAGING_API_URL= to generated .dev.env (fixes pre-existing envied
compilation failure that blocked all tests on clean checkout).
Per kelvin UAT review: callers must not assume result order matches
input person_ids order.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@beastoin
Copy link
Collaborator Author

beastoin commented Mar 8, 2026

lgtm

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This prerelease combines three independently verified fixes targeting active 500 errors, a 401 token-refresh gap for multipart API calls, and Firestore read cost reduction via in-process caching.

Key changes:

  • /v1/users/people 500 fix (users.py, models/other.py): Legacy Firestore person documents missing the id field or timestamp fields now handled gracefully — get_person/get_people inject doc.id via setdefault, get_people_by_ids switches from a where("id", "in", ...) query to db.get_all(doc_refs), and created_at/updated_at are made optional on the Person model.
  • /v1/conversations 500 fix (conversations.py router + db): The list endpoint switches from get_conversations (loads full base64 photo content) to get_conversations_without_photos, preventing the 32 MB Cloud Run response-size limit from being exceeded. folder_id and starred filters are added to get_conversations_without_photos and correctly forwarded from the router.
  • Multipart 401 retry (shared.dart): Both makeMultipartApiCall and makeMultipartStreamingApiCall now detect 401 responses, force-refresh the auth token, rebuild the multipart request (required since HTTP streams are single-use), and retry. Persistent 401 after retry triggers sign-out. Mirrors the existing retry pattern already present for non-multipart calls.
  • Firestore caching (cache_manager.py, notifications.py, utils/apps.py, transcribe.py, redis_db.py): Three hot Firestore paths are wrapped with a 30-second in-memory cache (mentor notification frequency, is_tester flag + per-user app slice) and a 15-minute credit-seconds cache with active Redis invalidation on subscription changes. A pre-existing memory leak in InMemoryCacheManager._fetch_locks (locks never removed after get_or_fetch) is fixed with reference counting. A cache-mutation bug in get_available_apps (mutating cached dict objects) is fixed with a shallow copy.

All production logic changes are straightforward, well-tested (72/72 unit tests), and verified with live API + physical device E2E.

Confidence Score: 4/5

  • Safe to merge — all three fixes address active production regressions with low change risk; deploy should be triggered promptly per the PR description.
  • All production logic changes are straightforward, well-tested (72/72 unit tests), and verified with live API + physical device E2E. The small deductions: (1) the Dart multipart test validates logic through a standalone reimplementation rather than testing the full production singletons integration, and (2) in-memory cache invalidation is local-only in multi-pod deployments, meaning other pods may briefly serve stale tester data — both are known, acceptable tradeoffs rather than blocking issues.
  • No files require special attention. All changes are correct and well-tested.

Last reviewed commit: cc5df0b

@beastoin beastoin merged commit 8c7ca81 into main Mar 9, 2026
2 checks passed
@beastoin beastoin deleted the prerelease/5441-5442-5443 branch March 9, 2026 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant