report: Combined verification + architecture analysis (#5441, #5442, #5443) by beastoin · Pull Request #5445 · BasedHardware/omi

beastoin · 2026-03-08T04:44:34Z

Summary

Report-only PR documenting combined verification results for PRs #5441, #5442, #5443 and architecture analysis of recent shifts in the Omi project. No code changes — this PR serves as a reviewable artifact for manager approval.

Updated 2026-03-09: Reflects all review-cycle fixes (credits invalidation #5446, real Redis integration tests, pub/sub unit tests, _fetch_locks refcounted cleanup).

Verification Report

PR #5441 — People/Conversations 500s Fix (yuki)

Unit Tests: 14/14 pass

Live API Verification (dev Firestore, real data):

GET /v1/users/people → 200 (tested with fresh user + user with existing data)
GET /v1/conversations?limit=50 → 200, 141KB response (previously 57MB with embedded photos)
GET /v1/conversations/{id} → 200 (individual conversation detail)
5 synthetic legacy Firestore docs (missing created_at/updated_at timestamps) → all return 200

Key Changes Verified:

get_conversations_without_photos() — new function that skips photo subcollection loads for list endpoints
get_people() — injects doc ID via data.setdefault('id', person.id) for legacy doc compatibility
get_people_by_ids() — uses db.get_all(doc_refs) batch fetch instead of where("id","in",...)

Codex Audit: 6 gaps identified, 5 resolved with evidence, 1 accepted risk (cross-pod 30s TTL for non-critical paths)

Cache Impact: None — endpoints hit Firestore directly (confirmed by yuki).

PR #5442 — Multipart 401 Retry (kenji)

Unit Tests: 7/7 pass

Emulator Verification:

APK built successfully with JDK 21 (Gradle toolchain requirement)
Installed and launched on Android emulator (omi-dev AVD)
App launched without crash

Code Review — 3 retry handlers verified in app/lib/backend/http/shared.dart:

makeApiCall() (line 116) — standard HTTP 401 retry
makeMultipartApiCall() (line 219) — multipart with full request rebuild (file streams can't be reused)
makeMultipartStreamingApiCall() (line 341) — streaming multipart with same rebuild pattern

Pattern: detect 401 → refresh token → rebuild headers AND request body → retry once → force sign-out on second failure

Cache Impact: None — client-side only (confirmed by kenji).

PR #5443 — Firestore Read Ops Cache (hiro)

Unit Tests: 51/51 pass (updated from initial 19 after review-cycle additions)

Test breakdown:

TestMentorFrequencyCache (5) — local cache behavior
TestTesterAndAppSliceCache (6) — local invalidation
TestCreditCacheLogic (9) — passive refresh timing
TestFetchLockCleanup (4) — singleflight refcounted lock cleanup
TestRedisCreditsInvalidationSignal (3) — Redis signal set/check
TestWebhookInvalidationCoverage (8) — source-code scanning for function coverage
TestRedisPubSubManager (12) — pub/sub callback logic (unit)
TestRedisPubSubIntegration (3) — real Redis (localhost:6379, separate clients simulating pods)

Latency Verification (live API, 3 sequential requests each):

Endpoint	Cold (ms)	Warm (ms)	Speedup
Mentor notification	855	0.0	102,550x
Apps endpoint	3,561	301	11.8x
Mentor API	183	1.8	101x

Cache Invalidation — 3 strategies verified:

Write-through (same-pod): PATCH → cache.delete(key) → next GET fetches fresh from Firestore
Cross-pod Redis pub/sub: publish invalidation event → other pods clear memory caches. Verified with 3 real Redis integration tests using separate clients simulating different pods.
TTL expiry (fallback): 30s memory TTL provides bounded staleness

Review-Cycle Fixes (post-initial-review):

Issue BLOCKING: Freemium credits cache has no active invalidation — transcripts dropped for up to 15min after upgrade #5446 — BLOCKING credits cache gap: remaining_transcript_seconds had a 15min passive refresh. If a user upgraded mid-stream, transcripts were silently dropped for up to 15 minutes (gated by user_has_credits at transcribe.py:1845). Fixed: Active invalidation via Redis signal — set_credits_invalidation_signal(uid) on 4 Stripe webhook points, check_credits_invalidation(uid) in transcribe loop every 60s, GET-not-GETDEL for multi-stream safety.
Real Redis integration tests: Added after review identified all initial pub/sub tests used MagicMock with zero real Redis connection. 3 tests now use redis.Redis(localhost:6379) with separate clients.
_fetch_locks refcounted cleanup: Fixed potential memory leak where per-key singleflight locks accumulated. Now uses refcount tracking — lock deleted only when no waiters remain.

New Files:

database/cache.py — global singleton init, atexit cleanup, Redis pub/sub callbacks
database/cache_manager.py — InMemoryCacheManager with LRU eviction, per-entry TTL, singleflight pattern, thread-safe (RLock), 100MB default limit

Combined Results

72/72 total unit tests pass (14 + 7 + 51)
All 3 PRs merge cleanly (one test.sh conflict resolved — both PRs added test entries)
Codex quality audit passed (6 gaps identified, 5 resolved, 1 accepted risk)
1 BLOCKING issue identified and fixed (BLOCKING: Freemium credits cache has no active invalidation — transcripts dropped for up to 15min after upgrade #5446 — credits cache staleness)
Combined verification branch: verify/combined-5441-5442-5443

E2E Physical Device Test — PASS (2/2)

Test Setup

Device: Pixel 7a (33041JEHN18287) via Mac Mini ADB
Audio source: NYT podcast (Simplecast, 38:52 speech) via Chrome browser
Mic: Phone built-in mic (BLE device mic couldn't acoustically reach phone speaker)
App: Omi prod, BLE device Omi CV 1 (FW 3.0.15, HW rev 5.0)

Test 1 — Short Clip (4m 16s): PASS

Title: "Climate activism, political obstacles, and resistance strategies"
Multi-speaker detection (3+ speakers)
AI summary: 4 structured sections with bullet points
Transcript with timestamps and "translated by omi" tags

Test 2 — 15min Podcast (16m 30s): PASS

Title: "The Limits and Power of Storytelling in Social and Political Change"
3+ speakers detected (Speaker 1, 2, 3)
AI summary: 4+ sections (Violence against women, Limits of storytelling, Climate/markets, Leadership)
Live transcript verified at 7-min midpoint — timestamps accurate, speaker labels correct

Verified Features

Conversation creation, real-time transcription (Deepgram STT), speaker diarization, AI summarization, timestamp generation, translation tags, conversation history.

Evidence screenshots: see PR comment below

Architecture Analysis — Recent Shifts

1. New: Two-Tier Caching Layer (PR #5443, PR #5378)

The Omi backend has shifted from a single-tier caching model (Redis only) to a two-tier architecture:

Request → In-Memory Cache (30s TTL, LRU, singleflight)
              ↓ miss
          Redis Cache (10-30min TTL, shared across pods)
              ↓ miss
          Firestore (source of truth)

New files: database/cache.py, database/cache_manager.py

Key decisions:

30s memory TTL chosen to balance freshness vs Firestore cost — acceptable for non-critical reads like mentor notification frequency (polled every 1s per stream)
Singleflight pattern prevents thundering herd — only ONE concurrent request calls the fetch function, others wait
Redis pub/sub for cross-pod invalidation — when one pod writes, it publishes an invalidation event so other pods clear their memory caches
Write-through invalidation — mutations delete the cache key immediately (no stale writes)
Active credit invalidation — Redis signal for subscription upgrades, checked every 60s in transcribe loop (fixes BLOCKING: Freemium credits cache has no active invalidation — transcripts dropped for up to 15min after upgrade #5446)

Impact: Firestore LOOKUP reads reduced 18-29% sustained (PR #5378 monitoring data at T+20h)

2. New: Photo-Less List Endpoints (PR #5441)

Shift: API list endpoints now explicitly separate "list" from "detail" data shapes.

get_conversations_without_photos() skips the Firestore photo subcollection entirely
400x payload reduction: 57MB → 141KB for conversation lists
Used by GET /v1/conversations (list endpoint), while GET /v1/conversations/{id} (detail) still loads photos

Key decision: Separate function rather than conditional flag — cleaner separation, no risk of breaking existing callers that depend on photos being present.

3. New: Multipart 401 Resilience (PR #5442)

Shift: The Flutter HTTP client now handles token expiration consistently across ALL request types, including multipart.

Challenge: Multipart requests use file streams that can only be read once — you can't simply "retry" with new headers. The solution rebuilds the entire request (headers + body + file streams) from scratch.

Key decision: Force sign-out after second 401 failure — prevents infinite retry loops and surfaces auth issues to the user immediately.

4. Shifted: Firestore Cost Model (PR #5378)

Before: Every backend read hit Firestore directly. High-frequency paths (mentor notifications at 1 read/second/stream) accumulated significant cost.

After: Targeted field projections + in-memory caching for hot paths. Firestore reads are now budgeted — high-frequency reads go through cache, low-frequency reads still hit Firestore directly.

Key decision: Only cache endpoints with measurable hot-path cost, not blanket caching. This keeps the system simple and debuggable.

5. Shifted: Database Module Scope

Before: database/ contained only Firestore and Redis connection code.

After: database/ now includes caching infrastructure (cache.py, cache_manager.py). This follows the module hierarchy — caching is a data-access concern that sits at the lowest level, imported by utils/ and routers/.

The module hierarchy remains enforced: database/ → utils/ → routers/ → main.py

6. Stable: Service Map Unchanged

The inter-service architecture is unchanged by these PRs:

backend → pusher → diarizer → deepgram → vad
agent-proxy → user agent VMs
notifications-job (cron)

All changes are within backend internals (data access layer + Flutter client). No new services, no new inter-service calls, no new environment variables.

PR Links

Verification performed by kelvin on VPS with combined branch, live API testing, Android emulator, physical device E2E test (Pixel 7a), and Codex quality audit.

by AI for @beastoin

Legacy Firestore person documents may lack these fields, causing ResponseValidationError (500) on /v1/users/people for 8 users. Make both fields Optional with None default. Fixes part of #5423 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

get_people(), get_person(), get_person_by_name(), get_people_by_ids() all returned raw to_dict() without the document ID. Legacy docs missing the 'id' field caused ResponseValidationError on Person model. Fixes #5423 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Enables the list endpoint to use this lighter function that skips loading full base64 photo content per conversation. Fixes part of #5424 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

GET /v1/conversations was loading full base64 photos for every conversation via @with_photos decorator. 50 convos x 1.2MB = 57MB exceeded Cloud Run 32MB response limit. The list endpoint doesn't need photo content — individual conversation GET still loads them. Fixes #5424 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ingApiCall Port the 401→refresh→retry→signout pattern from makeApiCall() into both multipart methods. Extract _buildMultipartRequest() helper to rebuild requests for retry (streams are single-use). Fixes #5414

…5439)

Reviewer feedback: where("id", "in", ...) misses legacy docs that don't have a stored 'id' field. Use db.get_all() with doc refs instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

10 tests covering: - Person model resilience with missing created_at/updated_at (#5423) - Doc ID injection in get_people, get_person, get_people_by_ids (#5423) - Conversations list endpoint uses without-photos function (#5424) - get_conversations_without_photos supports folder_id/starred (#5424) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Large batch test for get_people_by_ids (>30 IDs, old where-in limit) - Empty list boundary test - Verify get_conversations_without_photos lacks @with_photos decorator - Verify get_conversations retains @with_photos for individual use Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

7 tests covering all code paths: - Non-401 returns directly (no refresh, no signout) - 401 → refresh succeeds → retry succeeds (200) - 401 → refresh succeeds → retry still 401 → signs out - 401 → refresh fails (empty token) → signs out without retry - requireAuthCheck=false skips 401 handling - Request rebuilt with fresh headers for retry - 500 does not trigger auth retry

… retry test Adds STAGING_API_URL= to generated .dev.env (fixes pre-existing envied compilation failure that blocked all tests on clean checkout).

beastoin · 2026-03-08T05:05:21Z

Test Evidence — PR #5441 (People/Conversations 500s Fix)

14/14 tests pass

============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0

tests/unit/test_people_conversations_500s.py::TestPersonModelResilience::test_person_missing_created_at_updated_at PASSED [  7%]
tests/unit/test_people_conversations_500s.py::TestPersonModelResilience::test_person_with_all_fields PASSED [ 14%]
tests/unit/test_people_conversations_500s.py::TestPersonModelResilience::test_person_defaults PASSED [ 21%]
tests/unit/test_people_conversations_500s.py::TestGetPeopleDocIdInjection::test_get_people_injects_doc_id PASSED [ 28%]
tests/unit/test_people_conversations_500s.py::TestGetPeopleDocIdInjection::test_get_people_preserves_existing_id PASSED [ 35%]
tests/unit/test_people_conversations_500s.py::TestGetPeopleDocIdInjection::test_get_person_injects_doc_id PASSED [ 42%]
tests/unit/test_people_conversations_500s.py::TestGetPeopleDocIdInjection::test_get_person_returns_none_when_not_exists PASSED [ 50%]
tests/unit/test_people_conversations_500s.py::TestGetPeopleDocIdInjection::test_get_people_by_ids_uses_doc_fetch PASSED [ 57%]
tests/unit/test_people_conversations_500s.py::TestGetPeopleDocIdInjection::test_get_people_by_ids_handles_large_batch PASSED [ 64%]
tests/unit/test_people_conversations_500s.py::TestGetPeopleDocIdInjection::test_get_people_by_ids_empty_list PASSED [ 71%]
tests/unit/test_people_conversations_500s.py::TestConversationsListNoPhotos::test_list_endpoint_uses_without_photos PASSED [ 78%]
tests/unit/test_people_conversations_500s.py::TestConversationsListNoPhotos::test_get_conversations_without_photos_has_folder_starred PASSED [ 85%]
tests/unit/test_people_conversations_500s.py::TestConversationsListNoPhotos::test_without_photos_function_not_decorated_with_photos PASSED [ 92%]
tests/unit/test_people_conversations_500s.py::TestConversationsListNoPhotos::test_with_photos_present_on_get_conversations PASSED [100%]

======================== 14 passed, 2 warnings in 1.19s ========================

by AI for @beastoin

beastoin · 2026-03-08T05:05:35Z

Test Evidence — PR #5443 (Firestore Read Ops Cache)

19/19 tests pass

============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0

tests/unit/test_firestore_read_ops_cache.py::TestMentorFrequencyCache::test_cache_hit_skips_firestore PASSED [  5%]
tests/unit/test_firestore_read_ops_cache.py::TestMentorFrequencyCache::test_cache_returns_zero_correctly PASSED [ 10%]
tests/unit/test_firestore_read_ops_cache.py::TestMentorFrequencyCache::test_cache_ttl_expiry PASSED [ 15%]
tests/unit/test_firestore_read_ops_cache.py::TestMentorFrequencyCache::test_invalidation_on_set PASSED [ 21%]
tests/unit/test_firestore_read_ops_cache.py::TestMentorFrequencyCache::test_default_for_nonexistent_user PASSED [ 26%]
tests/unit/test_firestore_read_ops_cache.py::TestTesterAndAppSliceCache::test_tester_flag_cached PASSED [ 31%]
tests/unit/test_firestore_read_ops_cache.py::TestTesterAndAppSliceCache::test_tester_false_cached PASSED [ 36%]
tests/unit/test_firestore_read_ops_cache.py::TestTesterAndAppSliceCache::test_user_slice_cached PASSED [ 42%]
tests/unit/test_firestore_read_ops_cache.py::TestTesterAndAppSliceCache::test_empty_lists_cached PASSED [ 47%]
tests/unit/test_firestore_read_ops_cache.py::TestTesterAndAppSliceCache::test_tester_cache_invalidation PASSED [ 52%]
tests/unit/test_firestore_read_ops_cache.py::TestTesterAndAppSliceCache::test_no_mutation_leakage PASSED [ 57%]
tests/unit/test_firestore_read_ops_cache.py::TestCreditCacheLogic::test_initial_fetch PASSED [ 63%]
tests/unit/test_firestore_read_ops_cache.py::TestCreditCacheLogic::test_within_ttl_no_refresh PASSED [ 68%]
tests/unit/test_firestore_read_ops_cache.py::TestCreditCacheLogic::test_expired_ttl_triggers_refresh PASSED [ 73%]
tests/unit/test_firestore_read_ops_cache.py::TestCreditCacheLogic::test_local_decrement PASSED [ 78%]
tests/unit/test_firestore_read_ops_cache.py::TestCreditCacheLogic::test_local_decrement_clamps_at_zero PASSED [ 84%]
tests/unit/test_firestore_read_ops_cache.py::TestCreditCacheLogic::test_none_means_unlimited_no_decrement PASSED [ 89%]
tests/unit/test_firestore_read_ops_cache.py::TestCreditCacheLogic::test_zero_triggers_fast_refresh PASSED [ 94%]
tests/unit/test_firestore_read_ops_cache.py::TestCreditCacheLogic::test_zero_within_fast_refresh_window PASSED [100%]

============================== 19 passed in 1.21s ==============================

by AI for @beastoin

beastoin · 2026-03-08T05:05:47Z

Test Evidence — PR #5442 (Multipart 401 Retry)

7/7 Flutter unit tests pass

00:03 +0: makeMultipartApiCall 401 retry logic non-401 response returns directly without refresh or signout
00:03 +1: makeMultipartApiCall 401 retry logic non-401 response returns directly without refresh or signout
00:03 +1: makeMultipartApiCall 401 retry logic 401 → refresh succeeds → retry succeeds (200)
00:03 +2: makeMultipartApiCall 401 retry logic 401 → refresh succeeds → retry succeeds (200)
00:03 +2: makeMultipartApiCall 401 retry logic 401 → refresh succeeds → retry still 401 → signs out
00:03 +3: makeMultipartApiCall 401 retry logic 401 → refresh succeeds → retry still 401 → signs out
00:03 +3: makeMultipartApiCall 401 retry logic 401 → refresh fails (empty token) → signs out immediately without retry
00:03 +4: makeMultipartApiCall 401 retry logic 401 → refresh fails (empty token) → signs out immediately without retry
00:03 +4: makeMultipartApiCall 401 retry logic 401 with requireAuthCheck=false returns 401 without retry
00:03 +5: makeMultipartApiCall 401 retry logic 401 with requireAuthCheck=false returns 401 without retry
00:03 +5: makeMultipartApiCall 401 retry logic request is rebuilt for retry (fresh stream)
00:03 +6: makeMultipartApiCall 401 retry logic request is rebuilt for retry (fresh stream)
00:03 +6: makeMultipartApiCall 401 retry logic 500 response does not trigger auth retry
00:03 +7: makeMultipartApiCall 401 retry logic 500 response does not trigger auth retry
00:03 +7: All tests passed!

Test coverage:

Non-401 responses pass through without retry
401 → token refresh → successful retry (200)
401 → token refresh → still 401 → force sign out
401 → refresh fails (empty token) → sign out immediately
Non-auth URLs skip retry entirely
Request is fully rebuilt for retry (fresh file streams)
500 errors don't trigger auth retry

by AI for @beastoin

beastoin · 2026-03-08T05:06:00Z

Combined Verification Summary

Test Results: 40/40 PR-specific tests PASS

PR	Author	Tests	Result
#5441	yuki	14	14/14 PASS
#5442	kenji	7	7/7 PASS
#5443	hiro	19	19/19 PASS
Total		40	40/40 PASS

Merge Compatibility

All 3 PRs merge cleanly onto main
One test.sh conflict resolved (both Fix /v1/users/people and /v1/conversations 500 errors #5441 and perf(backend): cache 3 Firestore hot paths to cut read costs ~59% (#5439) #5443 append test entries — kept both)

Pre-existing test failures (NOT from these PRs)

test_process_conversation_usage_context.py: 5 failures on main
Root cause: MagicMock vs int comparison in max(0, ...) — unrelated to any of the 3 PRs

Codex Quality Audit

6 gaps identified across all 3 PRs
5 resolved with evidence (legacy docs, cache invalidation, photo defaults, code structure, live API)
1 accepted risk: cross-pod cache staleness bounded by 30s TTL (acceptable for non-critical paths)

by AI for @beastoin

beastoin · 2026-03-08T06:07:01Z

User Acceptance Test (UAT) Report — Combined PR

1. Full Test Suite Results

Backend (test.sh): All 45 test files executed

40/40 PR-specific tests PASS (14 + 7 + 19)
5 pre-existing failures in test_process_conversation_usage_context.py (MagicMock vs int in max() — exists on main, unrelated to these PRs)

App (test.sh): All test files pass

multipart_401_retry_test.dart — 7/7 PASS (PR Fix: add 401 token refresh to multipart API calls #5442)
conversation_provider_test.dart — 18/18 PASS
transcript_test.dart — 3/3 PASS
audio_player_utils_test.dart — 8/8 PASS
env_test.dart — 9/9 PASS
testflight_preferences_test.dart — 5/5 PASS

2. APK Build

dev-debug APK built successfully on combined branch
app-dev-debug.apk produced without errors
No compile-time regressions from PR Fix: add 401 token refresh to multipart API calls #5442 Dart changes

3. Architecture Review

Import Compliance: PASS — No violations of module hierarchy (database/ → utils/ → routers/ → main.py)

Circular Imports: PASS — No circular dependencies detected

Logging Security: PASS — No raw sensitive data logged in any new code

Thread Safety: Generally sound — RLock used correctly, singleflight pattern correct

Decorator Correctness: PASS — get_conversations_without_photos() correctly omits @with_photos decorator while preserving identical query logic

4. Issues Found

WARNING: `_fetch_locks` Dict Grows Unbounded (PR #5443)

File: backend/database/cache_manager.py, lines 64, 116-118

Problem: The singleflight pattern creates a threading.Lock() per unique cache key but never removes old locks:

self._fetch_locks: Dict[str, threading.Lock] = {}  # line 64

# In get_or_fetch():
if key not in self._fetch_locks:
    self._fetch_locks[key] = threading.Lock()  # line 116-117 — never cleaned up

Cardinality analysis: 4 per-user keys (mentor_frequency:{uid}, is_tester:{uid}, user_apps_slice:{uid}:0, user_apps_slice:{uid}:1) + 2 fixed keys. At ~100 bytes per lock+key:

10K daily users = ~4MB
100K daily users = ~40MB

Severity: WARNING (not CRITICAL) — pods restart on every deploy, limiting accumulation. But should be fixed to prevent slow growth in long-running pods.

Fix: Remove lock from _fetch_locks after get_or_fetch() completes, or cap the dict size.

NOTE: `get_people_by_ids()` Order Not Preserved (PR #5441)

File: backend/database/users.py, line 100

Detail: db.get_all() returns results in arbitrary order (Firestore behavior). All current callers treat results as unordered sets, so this is safe today. Recommend adding a docstring noting this constraint to prevent future bugs.

NOTE: Double `signOut()` Potential (PR #5442)

File: app/lib/backend/http/shared.dart, lines 238-241

Detail: Concurrent failed 401 requests could both call AuthService.instance.signOut(). Likely safe since signOut is idempotent, but worth noting.

5. Cross-PR Regression Check

Check	Result
Cache changes affect conversation fetches	No interaction — different code paths
People changes affect cache layer	No interaction — people not cached
Multipart retry affects backend behavior	No interaction — client-side only
Shared state conflicts	None found — atomic cache operations
Module hierarchy preserved	Yes — all imports follow `database/` → `utils/` → `routers/`

6. Verdict

SAFE TO MERGE with one recommended fix:

Fix the _fetch_locks unbounded growth in cache_manager.py (non-blocking but should be addressed)
No regressions detected
Architecture is sound — clean separation of concerns, correct decorator usage, proper thread safety

by AI for @beastoin

Per kelvin UAT review: callers must not assume result order matches input person_ids order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin · 2026-03-08T08:51:36Z

E2E Physical Device Test — PASS (2/2)

Test Setup

Device: Pixel 7a (33041JEHN18287) via Mac Mini ADB
Audio source: NYT podcast (Simplecast, 38:52 speech) via Chrome browser
Mic: Phone built-in mic (phone mic fallback — BLE device mic couldn't acoustically reach phone speaker)
App: Omi prod, BLE device Omi CV 1 (FW 3.0.15, HW rev 5.0)

Test 1 — Short Clip (4m 16s): PASS

Title: "Climate activism, political obstacles, and resistance strategies"
Multi-speaker detection (3+ speakers)
AI summary: 4 structured sections with bullet points
Transcript with timestamps and "translated by omi" tags

Test 2 — 15min Podcast (16m 30s): PASS

Title: "The Limits and Power of Storytelling in Social and Political Change"
3+ speakers detected (Speaker 1, 2, 3)
AI summary: 4+ sections (Violence against women, Limits of storytelling, Climate/markets, Leadership)
Live transcript verified at 7-min midpoint — timestamps accurate, speaker labels correct

Verified Features

Conversation creation from live audio
Real-time transcription (Deepgram STT)
Speaker diarization (3+ speakers)
AI summarization (structured sections + bullet points)
Timestamp generation
Translation tags ("translated by omi")
Conversation history and navigation

Notes

BLE device was connected and "Listening" throughout but couldn't acoustically reach phone speaker (physical distance). Used phone mic fallback after granting RECORD_AUDIO permission.
scrcpy was routing audio to remote_submix instead of speaker — restarted with --no-audio for test.
Device fully restored after test: BT re-enabled, BLE reconnected (green dot, 100%), scrcpy restored with audio.

by AI for @beastoin

…nto verify/combined-5441-5442-5443

…to verify/combined-5441-5442-5443

…9' into verify/combined-5441-5442-5443 # Conflicts: # backend/test.sh

beastoin · 2026-03-08T13:38:23Z

Re-Test Report — Updated Combined Branch (all 3 sub-PRs at latest)

Branch fix: Remote PR branch was missing 14 commits from #5443 (only had early snapshot). Pushed local re-merge — now includes all 19 commits from #5443 including _fetch_locks fix, #5446 Redis invalidation, GET-not-GETDEL, PubSubManager tests, and integration tests. Verified with git merge-base --is-ancestor.

Test Results: 72/72 PASS

PR	Scope	Test File	Result
#5441	People/conversations 500s fix	`test_people_conversations_500s.py`	14/14 PASS
#5443	Firestore read ops cache (full)	`test_firestore_read_ops_cache.py`	51/51 PASS
#5442	Multipart 401 retry	`multipart_401_retry_test.dart`	7/7 PASS

APK Build

Status: SUCCESS (203MB, dev-debug flavor)
Branch: verify/combined-5441-5442-5443 (all 3 sub-PRs merged at latest tips)

agent-flutter Widget-Level Testing

Connected agent-flutter to Omi debug app via Dart VM Service:

connect ws://127.0.0.1:38609/.../ws — connected to isolate
snapshot -i -c — resolved 9 interactive widget refs (GestureDetector, TextField, InkWell, ElevatedButton)
find text "English" press — widget-level tap by text (no pixel coordinates)
find text "Confirm" press — widget-level button press
Firebase auth via VM Service evaluate — signInWithCustomToken succeeded (uid: test-kelvin-e2e)

Sub-PR Commit Verification

All 3 sub-PRs at their latest tips are included:

Fix /v1/users/people and /v1/conversations 500 errors #5441 b58e493 ✓ (19 commits)
Fix: add 401 token refresh to multipart API calls #5442 54115aa ✓ (3 commits)
perf(backend): cache 3 Firestore hot paths to cut read costs ~59% (#5439) #5443 a0b436d ✓ (19 commits, all review-cycle fixes included)

Re-tested by AI for @beastoin

beastoin · 2026-03-08T13:52:58Z

agent-flutter E2E Test — Logged In + Local Backend

Setup

Local backend (based-hardware-dev): uvicorn main:app --port 8000
App: .dev.env with API_BASE_URL=http://10.0.2.2:8000/, STAGING_API_URL= (empty)
Auth: Firebase signInWithCustomToken via Dart VM Service evaluate (dev project token accepted by local backend)
Tool: agent-flutter connect ws://... → Marionette widget-level control

agent-flutter Test Flow

connect ws://127.0.0.1:46491/.../ws  → Connected to isolate
find text "English" press             → Language selected
find text "Confirm" press             → Language set via API (200 OK)
→ Home screen loaded                  → 22 interactive elements
find text "Ask Omi" press             → Chat opened with welcome message
back                                  → Returned to home
snapshot -i -c                        → Verified widget tree intact
screenshot                            → Evidence captured

Backend API Log — Zero 500s

All endpoints returned 200 OK including the critical fixes:

GET /v1/users/people — 200 (PR Fix /v1/users/people and /v1/conversations 500 errors #5441: Person model resilience)
GET /v1/conversations — 200 (PR Fix /v1/users/people and /v1/conversations 500 errors #5441: no @with_photos on list)
PATCH /v1/users/language — 200 (local backend accepted dev token)

Evidence

Screen	Image
Home (logged in)
Chat (Ask Omi)

Recipe for Team

# 1. Local backend
cd backend && export $(grep -v '^#' .env | xargs) && \
  export GOOGLE_APPLICATION_CREDENTIALS=~/.config/omi/dev/backend/google-credentials.json && \
  python3 -m uvicorn main:app --port 8000

# 2. App .dev.env
API_BASE_URL=http://10.0.2.2:8000/
STAGING_API_URL=
USE_WEB_AUTH=false
USE_AUTH_CUSTOM_TOKEN=false

# 3. Rebuild envied + launch
cd app && rm -rf .dart_tool/build/ lib/env/dev_env.g.dart && \
  dart run build_runner build --delete-conflicting-outputs && \
  flutter run -d emulator-5554 --flavor dev --debug

# 4. Sign in (VM Service eval)
# Generate custom token via Firebase Admin SDK, then:
# FirebaseAuth.instance.signInWithCustomToken("<token>")

# 5. agent-flutter
agent-flutter connect          # auto-detects VM service
agent-flutter find text "English" press
agent-flutter find text "Confirm" press
agent-flutter snapshot -i -c   # verify widgets
agent-flutter screenshot /tmp/evidence.png

Tested by AI for @beastoin

beastoin · 2026-03-09T07:09:32Z

Physical Device E2E Evidence — Pixel 7a

Device: Pixel 7a (Android 14, 1080x2400 @ 420dpi)
APK: dev flavor debug, branch verify/combined-5441-5442-5443
Auth: Firebase (beastoin@gmail.com) via SharedPreferences injection
Date: 2026-03-09

Screens Verified

Screen	Status	Evidence
Home (Conversations + Daily Score)	PASS	screenshot
People tab (Search memories)	PASS	screenshot
Chat (Ask Omi — syncing + reading memories)	PASS	screenshot
Chat Apps panel (Omi selected)	PASS	screenshot

Findings

App launches, authenticates, and reaches home screen with no crashes
Conversations section renders (no 500 errors — PR Fix /v1/users/people and /v1/conversations 500 errors #5441 fix confirmed client-side)
Chat "Syncing messages with server..." + "Reading your memories..." confirms backend API connectivity
People tab loads with search and person cards
Bottom navigation (Home / Tasks / Mic / People / Apps) fully functional
No ANR or force-close during 10-minute session

Notes

Language selection dialog bypassed via adb shell SharedPreferences injection (hasSetPrimaryLanguage=true, userPrimaryLanguage=en)
Wireless ADB via Mac Mini (192.168.1.2:5555)
This supplements the earlier emulator-based E2E and 40/40 unit test suite

Physical device E2E: PASS

beastoin · 2026-03-09T08:39:51Z

Core Flow E2E — Record + Transcribe (30s Podcast) — Pixel 7a + Omi Device

Device: Pixel 7a (Android 14) + Omi BLE device (100% battery)
APK: dev flavor debug, branch verify/combined-5441-5442-5443
Backend: Local backend (combined branch code) with Deepgram nova-3 STT
Date: 2026-03-09

Setup

Built dev APK from combined branch with API_BASE_URL=http://192.168.1.12:8000/
Local backend running combined branch code with GOOGLE_CLOUD_PROJECT=based-hardware-dev
Pusher stub on port 8001 to accept transcript relay
SSH reverse tunnel from Mac Mini to build server for backend access
38-second TTS-generated podcast audio played through Mac Mini speakers near Omi device

Results

Phase	Transcript Visible	Evidence
Pre-recording	"Listening" (no transcript)	screenshot
Mid-recording (20s)	"...Welcome to today's episode of t..."	screenshot
Post-recording (38s)	"...How do we protect privacy? The..."	screenshot

Pipeline Verified

Omi BLE device → audio capture (opus_fs320, 16kHz)
App → WebSocket stream to backend (ws://192.168.1.12:8000/v4/listen)
Backend → Deepgram nova-3 STT transcription
Live transcript → displayed in app "Listening" banner
Conversations from Firestore load correctly alongside recording

Backend Logs Confirm

Deepgram connection started: True — STT connected
Connected to Pusher transcripts trigger WebSocket — audio relay active
Audio bytes flowing through pusher (type=101 messages, 30-55KB each)
Deepgram general-nova-3 model processing audio

Core flow 30s recording: PASS

beastoin · 2026-03-09T08:59:36Z

5-Minute Core Flow E2E — Pixel 7a + Omi BLE Device

Test: Record and transcribe 5-minute audio podcast via Omi BLE device
Device: Pixel 7a (physical, wireless ADB 192.168.1.2:5555)
Branch: verify/combined-5441-5442-5443
Flavor: dev (local backend + Deepgram nova-3)
Duration: ~6 minutes continuous audio

Pipeline

Mac Mini speakers (TTS podcast) → Omi BLE mic → Pixel 7a app → WebSocket → local backend → Deepgram nova-3 → live transcript in app

Evidence (5 checkpoints across the recording)

Time	Screenshot	Live Transcript Text
T=0 (pre-start)	01_pre_start	"Listening" — Omi connected, 100% battery
T=1min	02_1min	"...These devices don't just track y..."
T=2m30	03_2m30	"...Third, you need speaker diarization..."
T=4min	04_4min	"...The possibilities are genuinely e..."
T=6min (post)	05_post	"Listening" — session complete, app stable

Backend Logs Confirm

WebSocket accepted: /v4/listen?language=en&sample_rate=16000&codec=opus_fs320&stt_service=soniox
Deepgram nova-3 (model 421ebff2, version 2026-01-27.9249) connected successfully
Pusher WebSocket connected
Speech profile processed (15s stabilization)
Conversations created with 120s timeout segmentation:
- a5da24bb — processed and sent to pusher
- d4e7b2a1 — new stub created (next segment)
No WebSocket drops, no 401 errors, no crashes

Verdict

PASS — 5-minute sustained audio recording + live transcription via Omi BLE device on Pixel 7a physical device. Core flow (BLE → app → WebSocket → Deepgram STT → live transcript) works continuously without drops.

Combined with the 30s test (previous comment), both core flow tests pass.

github-actions · 2026-03-09T09:13:50Z

Hey @beastoin 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

Project standards — Ensuring consistency across the codebase
User needs — Making sure changes align with what our users need
Code best practices — Maintaining code quality and maintainability
Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

beastoin and others added 16 commits March 8, 2026 02:38

Add folder_id and starred params to get_conversations_without_photos

8146ffc

Enables the list endpoint to use this lighter function that skips loading full base64 photo content per conversation. Fixes part of #5424 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Cache credit check in transcribe loop with 15-min TTL (#5439)

2483a0d

Cache mentor notification frequency with field projection + 30s TTL (#…

53c5601

…5439)

Cache tester flag + user app slice with 30s TTL and invalidation (#5439)

fef99b1

Add tests for Firestore read ops cache optimization (#5439)

f8d8047

Register firestore read ops cache tests in test.sh (#5439)

e204682

Switch get_people_by_ids to document ID fetches for legacy doc support

f072709

Reviewer feedback: where("id", "in", ...) misses legacy docs that don't have a stored 'id' field. Use db.get_all() with doc refs instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add test_people_conversations_500s to test.sh

e8bba4c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix test.sh: add missing STAGING_API_URL to .dev.env and register 401…

54115aa

… retry test Adds STAGING_API_URL= to generated .dev.env (fixes pre-existing envied compilation failure that blocked all tests on clean checkout).

This was referenced Mar 8, 2026

perf(backend): cache 3 Firestore hot paths to cut read costs ~59% (#5439) #5443

Merged

Fix /v1/users/people and /v1/conversations 500 errors #5441

Merged

beastoin and others added 7 commits March 8, 2026 07:10

Add docstring noting db.get_all() arbitrary order in get_people_by_ids

b58e493

Per kelvin UAT review: callers must not assume result order matches input person_ids order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix unbounded _fetch_locks growth in InMemoryCacheManager

1519a1e

Add tests for singleflight lock cleanup

b353b91

Fix singleflight race with refcounted lock cleanup

ee54fe9

Add concurrency and exception safety tests for singleflight

28bedb5

Add Redis credit invalidation signal for active cache refresh (#5446)

303bc47

Signal credit invalidation on subscription changes (#5446)

6b59557

beastoin added 8 commits March 8, 2026 07:44

Check Redis invalidation signal in credit cache refresh loop (#5446)

0fda782

Add active invalidation tests for credit cache (#5446)

3353c52

Add Redis invalidation signal + webhook coverage tests (#5446)

1bb1f72

Use GET instead of GETDEL for multi-stream credit invalidation

45a8f4e

Use check_credits_invalidation (multi-stream safe)

449e24b

Update tests for multi-stream invalidation (GET, not GETDEL)

e3ad6cc

Add RedisPubSubManager unit tests for cross-pod invalidation

b07e0d5

Add real Redis integration tests for cross-pod pub/sub invalidation

a0b436d

beastoin added 3 commits March 8, 2026 10:09

Merge remote-tracking branch 'origin/fix/people-conversations-500s' i…

3337cb2

…nto verify/combined-5441-5442-5443

Merge remote-tracking branch 'origin/fix/multipart-401-retry-5414' in…

5cec682

…to verify/combined-5441-5442-5443

Merge remote-tracking branch 'origin/fix/firestore-read-ops-cache-543…

8c2bdb5

…9' into verify/combined-5441-5442-5443 # Conflicts: # backend/test.sh

beastoin force-pushed the report/verification-architecture-5441-5442-5443 branch from 43321ad to 8c2bdb5 Compare March 8, 2026 12:20

beastoin mentioned this pull request Mar 8, 2026

Prerelease: 3 verified fixes — 500s, 401 retry, Firestore cache (#5441 #5442 #5443) #5459

Merged

beastoin closed this Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

report: Combined verification + architecture analysis (#5441, #5442, #5443)#5445

report: Combined verification + architecture analysis (#5441, #5442, #5443)#5445
beastoin wants to merge 34 commits intomainfrom
report/verification-architecture-5441-5442-5443

beastoin commented Mar 8, 2026 •

edited

Loading

Uh oh!

beastoin commented Mar 8, 2026

Uh oh!

beastoin commented Mar 8, 2026

Uh oh!

beastoin commented Mar 8, 2026

Uh oh!

beastoin commented Mar 8, 2026

Uh oh!

beastoin commented Mar 8, 2026

Uh oh!

beastoin commented Mar 8, 2026

Uh oh!

beastoin commented Mar 8, 2026

Uh oh!

beastoin commented Mar 8, 2026

Uh oh!

beastoin commented Mar 9, 2026

Uh oh!

beastoin commented Mar 9, 2026

Uh oh!

beastoin commented Mar 9, 2026

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

beastoin commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification Report

PR #5441 — People/Conversations 500s Fix (yuki)

PR #5442 — Multipart 401 Retry (kenji)

PR #5443 — Firestore Read Ops Cache (hiro)

Combined Results

E2E Physical Device Test — PASS (2/2)

Test Setup

Test 1 — Short Clip (4m 16s): PASS

Test 2 — 15min Podcast (16m 30s): PASS

Verified Features

Architecture Analysis — Recent Shifts

1. New: Two-Tier Caching Layer (PR #5443, PR #5378)

2. New: Photo-Less List Endpoints (PR #5441)

3. New: Multipart 401 Resilience (PR #5442)

4. Shifted: Firestore Cost Model (PR #5378)

5. Shifted: Database Module Scope

6. Stable: Service Map Unchanged

PR Links

Uh oh!

beastoin commented Mar 8, 2026

Test Evidence — PR #5441 (People/Conversations 500s Fix)

Uh oh!

beastoin commented Mar 8, 2026

Test Evidence — PR #5443 (Firestore Read Ops Cache)

Uh oh!

beastoin commented Mar 8, 2026

Test Evidence — PR #5442 (Multipart 401 Retry)

Uh oh!

beastoin commented Mar 8, 2026

Combined Verification Summary

Test Results: 40/40 PR-specific tests PASS

Merge Compatibility

Pre-existing test failures (NOT from these PRs)

Codex Quality Audit

Uh oh!

beastoin commented Mar 8, 2026

User Acceptance Test (UAT) Report — Combined PR

1. Full Test Suite Results

2. APK Build

3. Architecture Review

4. Issues Found

WARNING: _fetch_locks Dict Grows Unbounded (PR #5443)

NOTE: get_people_by_ids() Order Not Preserved (PR #5441)

NOTE: Double signOut() Potential (PR #5442)

5. Cross-PR Regression Check

6. Verdict

Uh oh!

beastoin commented Mar 8, 2026

E2E Physical Device Test — PASS (2/2)

Test Setup

Test 1 — Short Clip (4m 16s): PASS

Test 2 — 15min Podcast (16m 30s): PASS

Verified Features

Notes

Uh oh!

beastoin commented Mar 8, 2026

Re-Test Report — Updated Combined Branch (all 3 sub-PRs at latest)

Test Results: 72/72 PASS

APK Build

agent-flutter Widget-Level Testing

Sub-PR Commit Verification

Uh oh!

beastoin commented Mar 8, 2026

agent-flutter E2E Test — Logged In + Local Backend

Setup

agent-flutter Test Flow

Backend API Log — Zero 500s

Evidence

Recipe for Team

Uh oh!

beastoin commented Mar 9, 2026

Physical Device E2E Evidence — Pixel 7a

Screens Verified

Findings

Notes

Uh oh!

beastoin commented Mar 8, 2026 •

edited

Loading

WARNING: `_fetch_locks` Dict Grows Unbounded (PR #5443)

NOTE: `get_people_by_ids()` Order Not Preserved (PR #5441)

NOTE: Double `signOut()` Potential (PR #5442)