Problem
During Refine turns, the same constraint is extracted twice in the same turn — once by graphflow_turn (the main stage graph execution) and again by refine_background_memory (a concurrent background task). This causes:
local_count growing faster than expected per turn
- Duplicate LLM extraction calls (wasted tokens + latency)
- Semantic near-duplicates accumulating (e.g., "Hockey Today" + "Hockey Training" both added for the same event)
Evidence
Session: 1772956522.019509 (2026-03-08) | log logs/timeboxing_session_20260308_075522_1772956522.019509_66949.log
- Refine turn 3: constraints 'Evening Routines' and 'Sleep Prep' appeared in both
graphflow_turn and refine_background_memory extraction logs for the same turn
local_count grew from 193 → 201 across the session (8 additions), but only ~4 semantically distinct constraints were introduced by the user
- "Hockey Today" and "Hockey Training" each added as separate rows despite referring to the same event
Partial mitigation already in place
The add_constraints method (merged 2026-03-08, branch issue/96-calendar-sync-idempotency) now deduplicates shared-scope constraints at the store level using _group_constraints_by_semantics. So even if both background tasks extract the same constraint, only one canonical row will be persisted/updated.
Remaining gap
The dual LLM extraction still fires — two separate LLM calls for constraint extraction per Refine turn, consuming tokens and adding latency. Ideally refine_background_memory should skip extraction if graphflow_turn already extracted constraints in the same turn.
Proposed fix directions
- Coordinator-level deduplication gate: track a
_last_extraction_turn_id in the session state; refine_background_memory checks if extraction already ran for this turn and skips
- Merge
refine_background_memory extraction into graphflow_turn: make background memory a post-hook of the stage graph rather than a separate concurrent task
- Token-bucket / extraction-once-per-turn semaphore: shared async lock keyed by
(session_key, turn_id) that only one extractor acquires
Acceptance criteria
Problem
During Refine turns, the same constraint is extracted twice in the same turn — once by
graphflow_turn(the main stage graph execution) and again byrefine_background_memory(a concurrent background task). This causes:local_countgrowing faster than expected per turnEvidence
Session:
1772956522.019509(2026-03-08) | loglogs/timeboxing_session_20260308_075522_1772956522.019509_66949.loggraphflow_turnandrefine_background_memoryextraction logs for the same turnlocal_countgrew from 193 → 201 across the session (8 additions), but only ~4 semantically distinct constraints were introduced by the userPartial mitigation already in place
The
add_constraintsmethod (merged 2026-03-08, branchissue/96-calendar-sync-idempotency) now deduplicates shared-scope constraints at the store level using_group_constraints_by_semantics. So even if both background tasks extract the same constraint, only one canonical row will be persisted/updated.Remaining gap
The dual LLM extraction still fires — two separate LLM calls for constraint extraction per Refine turn, consuming tokens and adding latency. Ideally
refine_background_memoryshould skip extraction ifgraphflow_turnalready extracted constraints in the same turn.Proposed fix directions
_last_extraction_turn_idin the session state;refine_background_memorychecks if extraction already ran for this turn and skipsrefine_background_memoryextraction intographflow_turn: make background memory a post-hook of the stage graph rather than a separate concurrent task(session_key, turn_id)that only one extractor acquiresAcceptance criteria
local_countgrowth per turn matches approximately the number of distinct new user constraints introduced in that turntimebox_log_query.py llm --session-key ...)