feat: framework logic-change detection for live components#2192
feat: framework logic-change detection for live components#2192pyjuan91 wants to merge 6 commits into
Conversation
Persist a live component's subtree logic-dependency set `S` (own fp ∪ all descendants) as it processes, so a later run can detect whether the processing logic changed without a second pass over persisted memo entries. Foundation for framework-level logic-change detection on live components (cocoindex-io#2124). - Surface a root build's rolled-up deps without touching the readiness path: `run_in_background` takes an optional `oneshot` outcome sink, used when a root has no parent readiness guard to roll up to. The foreground mount path keeps rolling up via the guard and passes None. - `update_full` recomputes and replaces `S` (edited-away fingerprints drop out); an incremental `update` op extends it with the item's subtree deps, skipping the write once they are already covered. - Store `S` under a framework-reserved Symbol key in the Live keyspace, encoded as a sorted fingerprint vec. Dropped sink ⇒ no persist (failure-safe). Write path only; reading `S` to gate a scan skip is a follow-up.
…tion Add the read side of framework-level logic-change detection for live components (cocoindex-io#2124): a predicate that reports whether a component's processing logic is unchanged since its last committed scan, so a durable connector can gate its startup full scan on it. `LiveComponentController::processing_unchanged()` reads the persisted subtree dependency set `S` and checks every fingerprint is still registered in the current logic set. Failure-safe — returns false when no scan was ever committed, when the stored value can't be decoded, or when any dependency's code changed (each means "re-scan"). Surfaced through PyO3 as `processing_unchanged_async`, then on `LiveComponentOperator`/`LiveComponentSubscriber` as `processing_unchanged()` so a connector pairs it with its own durable cursor: `<durable cursor> and await subscriber.processing_unchanged()`. Tests cover first-run (no S -> false), unchanged across runs (-> true), and a simulated child-code change (-> false).
Replace the OCI connector's manual `logic_version` skip-scan opt-in (cocoindex-io#2116) with the framework-computed signal from cocoindex-io#2124. The live view now gates its startup-scan skip on `durable_stream and await subscriber.processing_unchanged()` — no hand-maintained version string, no stale-state-on-forgotten-bump footgun. - `list_objects(..., logic_version=...)` → `list_objects(..., durable_stream=...)`: a bool the user sets to assert the stream durably replays its backlog. The logic-change check is now automatic; durability stays the user's responsibility (a LiveStream exposes no cursor to detect it). - Drop `_SCAN_VERSION_KEY` and the committed-version read/write dance; the framework persists the subtree dependency set itself. - Tests: mock subscriber drops the committed-state version simulation for a controllable `processing_unchanged()`; the four version-matching cases collapse to three (not-durable scans, durable+changed scans, durable+unchanged skips).
Bring the Rust SDK to parity with the Python SDK's live logic-change detection (cocoindex-io#2124): - Expose processing_unchanged() on LiveComponentOperator and the LiveMapSubscriber delegate, calling the core controller predicate. - Rewire the OCI live walker from the manual logic_version string to durable_stream: bool — skip the startup scan on reruns only when the user asserts a durable stream and the framework reports the processing logic unchanged. Removes OCI_SCAN_VERSION_KEY and the per-scan version write.
| let encoded = encode_logic_deps(deps)?; | ||
| component | ||
| .app_ctx() | ||
| .app_store() | ||
| .write_user_state_standalone( | ||
| component.stable_path(), | ||
| db_schema::StateKind::Live, | ||
| &logic_deps_state_key(), | ||
| &encoded, | ||
| ) | ||
| .await |
There was a problem hiding this comment.
I think user state may not be the right place for logic deps.
What we need to persist for change detection is pretty much the same as ComponentMemoizationInfo. And the intention is also similar to its existing usage: they keep necessary information to validate if a regular component's last execution is still valid, and our purpose is actually the same.
So I think probably we can just reuse ComponentMemoizationInfo and reuse the same entry in the DB for it.
User states for live components is intended to store states related to specific live-component logic. And live-component logic will decide if persisting the ComponentMemoizationInfo and if the last persisted ComponentMemoizationInfo is currently valid. But these don't belong to user states.
There was a problem hiding this comment.
Agreed, this is framework change-detection state, not user state. I'll move it onto the component's memoization entry and drop the sys/live_logic_deps user-state key.
Two things I'd confirm before wiring it up:
(1) Lifecycle. The live set accumulates rather than being a per-run snapshot, update_full replaces it, incremental update unions into it (an item processed earlier still exists on restart, so its subtree fingerprints must stay tracked). Validity check is the same all_contained_with_env, and the live controller owns the read-merge-write, which matches your "live logic decides when to persist / whether the last one is valid."
(2) Shape. Live only ever needs logic_deps. ComponentMemoizationInfo's processor_fp and return_value are required (no skip/default, and MemoizedValue has no empty variant), so reusing it as-is means storing dummy F/R. Do you prefer that, or factoring logic_deps (the shared part) into a leaner shape both paths reuse? Leaning toward the latter, but your call.
There was a problem hiding this comment.
hey @pyjuan91, thanks for the comment, what's your thought and take on this?
There was a problem hiding this comment.
Hi @badmonster0, I'll propose to reuse the single ComponentMemoizationInfo entry as @georgeh0 suggested, and make F/R Option so a live root stores just {L: [...]}.
I'd floated factoring logic_deps into a shared struct earlier. But that doesn't really fit "reuse the same entry", a leaner value would need either a different type under the ComponentMemoization key or a separate key. And it wouldn't remove the actual blocker anyway: F/R are required and a live root has neither. So I'd keep the one struct and just relax those two fields.
Option over sentinel F/R: with processor_fp: None a live entry can never match a regular skip (which always passes Some), so it can't be misread as a real memo, and sentinels would need a fake R, since MemoizedValue has no empty variant.
Lifecycle as before: update_full replaces the set, incremental update unions in, controller owns read-merge-write.
Small change either way, happy to store sentinels instead if you'd rather.
Address review on cocoindex-io#2192: a live component's subtree logic-dependency set `S` was persisted under a framework-reserved user-state key, but it is framework change-detection state — the same purpose and shape as a regular component's `ComponentMemoizationInfo`, which validates whether a past execution is still reusable. Reuse that entry instead of user state. - `ComponentMemoizationInfo.processor_fp` and `return_value` become `Option`, so a live root stores just its `logic_deps` with both unset. A `None` processor_fp never matches a regular memo skip (which compares `Some(fp)`), so a live entry can't be mistaken for a reusable memo. `Some(x)` serializes identically to the old bare field, so existing regular memo entries still decode. - `processing_unchanged()` reads the component's memo entry; `update_full` replaces `S` there via `finalize_memoization`; an incremental `update` extends it with a single-txn read-merge-write of the same row. Drop the `sys/live_logic_deps` user-state key and its encode/decode helpers. - The extend path's single txn also removes a lost-update race the prior read-then-write had between concurrent per-subpath drains. The live root is not memoized, so the memo check deletes `S` at the start of each `update_full` and it is rewritten only on success — `S` is present iff the last full scan completed, so a failed scan re-scans. No behavior change: the live and OCI logic-change tests are unchanged and still pass.
Summary
Closes #2124. Framework-computed logic-change detection for live
components, replacing #2116's hand-maintained
logic_versionwith a signalthe engine computes itself.
This completes D1's live-component scope. It builds on the already-merged
#2142 (the prerequisite fix that makes stored memos subtree-complete across
the mount boundary); together they deliver the issue's proposed direction
end to end. The one carve-out is processing-affecting config/arg values
(e.g.
chunk_size=512→1024), which the issue itself split to #2126 asa documented known gap —
processing_unchanged()is the signal #2126 ANDsan args-key check into.
The skip gate the issue describes —
bootstrap_done ∧ durable ∧ logic_unchanged— lands asdurable_stream and await processing_unchanged():processing_unchanged()folds in both a scan hasbootstrapped (false when no committed dep-set
Sexists) and logicunchanged (
all_contained_with_env(S)); durability stays connector-ownedvia
durable_stream.What's included
Core (
rust/core)run_in_backgroundgains an optional oneshotoutcome_sink, so a liveroot build (which has no parent readiness guard) can surface its
rolled-up subtree
logic_depsinstead of dropping them.update_fullREPLACEs the persisted aggregate setS; incrementalupdateEXTENDs it.Slives in theLivekeyspace under aframework-reserved key (
sys/live_logic_deps).LiveComponentController::processing_unchanged()readsSand checksevery fingerprint is still registered. Failure-safe
false(noS,decode error, or any dep's code changed ⇒ re-scan).
Python SDK (
rust/py+python/)processing_unchanged()exposed via PyO3 →LiveComponentOperator/LiveMapSubscriber.logic_version: strknob is replaced bydurable_stream: bool; skip-scan is nowdurable_stream and await subscriber.processing_unchanged()._SCAN_VERSION_KEYand the versionread/write dance are removed.
Rust SDK (
rust/sdk/cocoindex) — kept at parity with the Python SDKprocessing_unchanged()on the operator + subscriber.logic_version→durable_stream,OCI_SCAN_VERSION_KEYremoved.
Design notes (the three things flagged on the issue)
deps ride a dedicated oneshot
outcome_sinkout ofrun_in_background,separate from the
HandleOutcome/mark_readyreadiness+errormachinery (unchanged). A live root has no parent readiness guard, so
without this its now-populated
ComponentRunOutcome.logic_depsweredropped at task end.
update_fullrecomputes andreplaces
S(edited-away fingerprints drop out); incrementalupdateextendsS(a streamed item is an existing item on thenext restart, so its code must be in
S); delete is a no-op.hit contributes its full subtree closure for free — an unchanged
process_filehit still surfacesprocess_chunk's fingerprint, soediting a mounted child is correctly not skipped.
Scope / non-goals
The framework owns only the logic-change signal. Durability stays
connector-owned — the user asserts it via
durable_stream, since aLiveStreamexposes no cursor for the framework to detect; the connectorkeeps owning the skip wiring. Not generic live-component memoization or
per-item skip.
Known gap (tracked separately)
Detection covers code changes, not config/arg values
(
chunk_size=512→1024moves no fingerprint). That's the #2126follow-up —
processing_unchanged()is the signal #2126 ANDs an args-keyequality into.
Docs
durable_streamis user-facing and, as the D1 end state, legitimatelydocumentable — but I've left the OCI connector docs out of this PR since
docs are usually handled separately. Happy to add a section here or as a
follow-up, whichever fits your process.
Test plan
cargo test(core + SDK),uv run mypy,uv run pytest python/: greenprek run --all-files: all hooks passprocessing_unchanged()acrossruns (True when unchanged, False after a simulated logic change, False on
first run); OCI skip-scan (not-durable always scans / durable scans on
logic change / durable skips when unchanged)