UN-3266 [FIX] Async execution backend stabilization by harini-venkataraman · Pull Request #1903 · Zipstack/unstract

harini-venkataraman · 2026-04-06T13:57:35Z

What

A collection of fixes and follow-ups on top of the v2 executor-worker architecture,

Line-item extraction wired into LegacyExecutor via the new line-item executor plugin (PRs Feat/line item executor plugin #1899, Feat/line item executor plugin #1900).
Email / Date enforce-type correctly returns null (not the string "NA") when nothing is extractable, with the FE now rendering it as a plain null cell that visually matches other prompt outputs.
IDE subscription usage tracking moved from a sync decorator on prompt_studio_helper into the ide_callback worker so commits happen after the async run completes.
ETL filesystem / database destination no longer silently logs and skips when there is no tool result — it now raises so the file execution is correctly marked failed.
Structure tool / agentic pipeline now writes its JSON output into COPY_TO_FOLDER/ to match the old Docker tool layout that the destination connector expects.
Embedding usage metadata now propagates usage_kwargs (run_id, execution_id, file_name) into the indexing path so API deployment responses no longer drop the embedding entry.
Single Celery app — the legacy backend/backend/worker_celery.py shim and the old prompt_studio_core_v2/tasks.py (+ tests) are removed; both prompt_studio_helper.py and views.py now use the unified backend.celery_service.app.
Unstructured IO adapter reads via FileStorage instead of the local filesystem so it works on remote storage backends (fixes PermissionError).
Single-pass extraction forces chunk-size=0 so it uses full-context retrieval instead of vector DB retrieval.
Misc FE polish: ManageLlmProfiles add-vs-edit state reset, AddLlmProfile form reset on profile switch, ConfigureDs / Settings.css tweaks.

Why

These are issues that surfaced after the v2 executor architecture was rolled into main:

No queue isolation for agentic work — agentic executor jobs were sharing celery_executor_legacy with everything else, so a slow agentic run could starve regular extractions. Adding a dedicated celery_executor_agentic queue lets us scale and prioritize the two paths independently.
Line-item extraction was unsupported — LegacyExecutor._handle_outputs() raised LegacyExecutorError("LINE_ITEM extraction is not supported."). With the line-item executor plugin landing, that branch now delegates to the plugin instead.
"NA" string leaking to the FE for empty email/date — _convert_scalar_answer only checked the first LLM response for "NA"; if the second LLM call also returned "NA", the literal string was returned. The FE then displayed "NA" as a normal value, which is wrong (it should be a missing-value indicator). We now return None in both cases and the FE renders the literal null text in the same font as a normal value.
IDE subscription usage was being tracked synchronously — the @track_subscription_usage_if_available decorator on index_document and prompt_responder ran on the request thread, but the actual execution is now async (executor worker). The tracking call needs to happen after the run completes, so it has been moved into ide_index_complete / ide_prompt_complete callback tasks.
ETL destinations silently swallowed missing tool results — when the executor returned no tool result and there was no execution error, the destination connector previously logged a warning and continued, leaving the file marked successful with no data. We now raise so the failure is visible.
Structure tool didn't populate COPY_TO_FOLDER/ — the old Docker ToolExecutor._setup_for_run() created this directory and the FS destination connector reads from it. The Celery-native _execute_structure_tool_impl and _run_agentic_extraction were skipping it, so FS destinations had nothing to copy.
Missing embedding metadata in API deployment responses — when chunking was enabled, _handle_index_for_pipeline wasn't passing usage_kwargs to _handle_index, so the embedding adapter callback couldn't tag the row with the right file_execution_id, and the entry was missing from the response metadata.
Two Celery apps was unnecessary — worker_celery.py was a parallel Celery app instance to route to executor workers. Now that the main Django Celery app already has the executor queue routes, the second app is redundant and the shim adds maintenance burden.
Unstructured IO PermissionError on remote storage — the helper assumed it could open the input file directly with open(), which fails when the file lives on S3/MinIO/etc. Reading via FileStorage.read() works on every backend.
Single-pass was using vector DB retrieval — single pass should read the entire document in one LLM call, but without forcing chunk-size=0 the retrieval path used the configured chunk size and missed full-context.

How

Backend

backend/backend/worker_celery.py: deleted (114 lines).
backend/prompt_studio/prompt_studio_core_v2/tasks.py + test_tasks.py: deleted (767 lines). Async callbacks now live in workers/ide_callback/.
backend/prompt_studio/prompt_studio_core_v2/prompt_studio_helper.py:
- _get_dispatcher() now uses backend.celery_service.app directly instead of importing a separate worker Celery app.
- Removed @track_subscription_usage_if_available decorators from index_document and prompt_responder (subscription usage is now tracked in the IDE callback worker).
backend/prompt_studio/prompt_studio_core_v2/views.py: replaced get_worker_celery_app() with backend.celery_service.app for AsyncResult polling.
backend/prompt_studio/prompt_studio_core_v2/static/select_choices.json: re-key line_item → line-item to match the prompt-studio output type contract.
backend/workflow_manager/workflow_v2/workflow_helper.py: 4 dead lines removed.

Workers / executor

workers/shared/enums/worker_enums_base.py: new QueueName.EXECUTOR_AGENTIC = "celery_executor_agentic".
workers/shared/infrastructure/config/registry.py: WorkerType.EXECUTOR config now has additional_queues=[QueueName.EXECUTOR_AGENTIC].
workers/executor/worker.py: health endpoint now reports the actual subscribed queues from CELERY_QUEUES_EXECUTOR env.
workers/executor/executors/legacy_executor.py:
- _run_line_item_extraction() added; the LINE_ITEM branch in _handle_outputs() now delegates to it instead of raising.
- _handle_index_for_pipeline() accepts and forwards usage_kwargs so embedding usage is tagged with run/execution/file IDs.
- _convert_scalar_answer() returns None when either the first or second LLM call returns "NA" (case-insensitive, whitespace-trimmed).
- _sanitize_null_values / _sanitize_dict_values use .strip().lower() so whitespace-padded "NA" is also normalized to None.
- Single-pass: forces chunk-size=0 and chunk-overlap=0 on each output so retrieval uses the full extracted text.
workers/file_processing/structure_tool_task.py:
- _execute_structure_tool_impl and _run_agentic_extraction write the structured output JSON into {execution_data_dir}/COPY_TO_FOLDER/{stem}.json in addition to overwriting INFILE.
- Broader exception handling around output writes with exc_info=True for diagnosability.
workers/ide_callback/tasks.py: new _track_subscription_usage() helper called from both ide_index_complete and ide_prompt_complete. Errors are logged but never fail the callback.
workers/shared/workflow/destination_connector.py:
- Filesystem and database destinations raise RuntimeError when tool_execution_result is missing and there's no recorded execution error (previously logged + continued silently).
- Added warning logs around missing metadata_file_path / INFILE for diagnosability.
workers/shared/workflow/execution/service.py: source_file_name simplified to just file_name in execute_structure_tool params (was unnecessarily computing basename).
workers/run-worker-docker.sh: registers ide-callback worker type → ide_callback queue → health port 8089.

Frontend

DisplayPromptResult.jsx: split the previous output === undefined || output === null guard. undefined still renders the "Yet to run" indicator; null now renders a plain <Typography.Text className="prompt-output-result">null</Typography.Text> so the font, size, weight and color match all other prompt outputs (no italic, no light grey, no oversize).
PromptCard.css: removed the .prompt-null-value rule that was making the null text italic / grey / 13px.
DisplayPromptResult.test.jsx: updated the null-output test to assert the literal "null" text and that the "Yet to run" indicator is not shown.
ManageLlmProfiles.jsx: clear editLlmProfileId when "Add new LLM profile" is clicked, so the modal opens in add mode instead of inheriting the previously-edited profile.
AddLlmProfile.jsx: drop the unmount-time setEditLlmProfileId(null) (handled by the parent now). When the form needs to reset, use form.setFieldsValue(formDetails) instead of form.resetFields() so the prefilled defaults are kept.
ConfigureDs.jsx: minor.
Settings.css: minor cleanup (1 line removed).

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No, these are fixes for async execution

Database Migrations

None.

Env Config

NA

Relevant Docs

NA

Related Issues or PRs

Feat/line item executor plugin #1899 — Feat/line item executor plugin
Feat/line item executor plugin #1900 — Feat/line item executor plugin (rebased on this branch)
UN-3266 [MISC] Remove redundant worker_celery.py and fix ide-callback worker #1892 — UN-3266 [MISC] Remove redundant worker_celery.py and fix ide-callback worker
Internal: agentic prompt studio queue routing follow-up

Dependencies Versions

None changed.

Notes on Testing

Agentic queue routing
- Submit an agentic prompt-studio run.
- Confirm the task lands on celery_executor_agentic (RabbitMQ UI) and is picked up by the executor worker.
Line-item extraction
- Create a prompt with output type line-item against a multi-row document.
- Verify rows are returned and persisted.
Email / Date enforce-type — empty case
- Run an Email prompt against a document with no email address. Run a Date prompt against a document with no date.
- The cell should render the literal null in the same font, size, weight and color as a successful text extraction in a sibling prompt — no italic, no light grey, no oversized text.
IDE subscription usage
- Trigger an IDE index + prompt run on an org with subscription usage tracking enabled.
- Confirm the usage row is committed by the worker-ide-callback worker (check its logs for IDE subscription usage committed for run_id=…).
ETL destination hard-fail
- Force a workflow file with no tool execution result and no recorded error (e.g. by killing the executor mid-run).
- The destination should now mark the file failed instead of silently logging "skipping database insertion".
Structure tool COPY_TO_FOLDER
- Run a workflow with a filesystem destination.
- Confirm {file_execution_dir}/COPY_TO_FOLDER/{stem}.json exists and contains the structured output.
API deployment embedding metadata
- Run an API deployment against a chunked document.
- Inspect the response — the embedding entry should now appear in the metadata for that file.
Unstructured IO on remote storage
- Configure FILE_STORAGE_PROVIDER=minio (or s3) and run a workflow that uses the Unstructured IO X2Text adapter.
- Should no longer raise PermissionError.
Manage LLM Profiles
- In Prompt Studio, edit one LLM profile, close the modal, then click "Add new LLM profile".
- The modal should open empty (add mode), not prefilled with the previously edited profile.

Screenshots

N/A — the only visible change is that the rendered null text in prompt-output cells now matches the styling of other prompt values (same font / size / weight / color).

Checklist

I have read and understood the Contribution Guidelines.

Conflicts resolved: - docker-compose.yaml: Use main's dedicated dashboard_metric_events queue for worker-metrics - PromptCard.jsx: Keep tool_id matching condition from our async socket feature - PromptRun.jsx: Merge useEffect import from main with our branch - ToolIde.jsx: Keep fire-and-forget socket approach (spinner waits for socket event) - SocketMessages.js: Keep both session-store and socket-custom-tool imports + updateCusToolMessages dep - SocketContext.js: Keep simpler path-based socket connection approach - usePromptRun.js: Keep Celery fire-and-forget with socket delivery over polling - setupProxy.js: Accept main's deletion (migrated to Vite)

…on-backend

for more information, see https://pre-commit.ci

…on-backend

… into feat/execution-backend

for more information, see https://pre-commit.ci

… into feat/execution-backend

Add a defensive guard in `UsageHelper.get_usage_by_model()` that drops `Usage` rows where `usage_type == "llm"` and `llm_usage_reason` is empty. Per the Usage model contract, an empty reason is only valid when `usage_type == "embedding"`; an empty reason combined with `usage_type == "llm"` is a producer-side bug (an LLM call site forgot to pass `llm_usage_reason` in `usage_kwargs`). Without this guard the row surfaces in API deployment responses as a malformed bare `"llm"` bucket with no token breakdown alongside the legitimate `"extraction_llm"` bucket. The guard logs a warning on every dropped row so future producer regressions are detectable. Adds three regression tests in `backend/usage_v2/tests/test_helper.py` that stub `account_usage.models` and `usage_v2.models` in `sys.modules` so the helper can be imported without Django being set up: - `test_unlabeled_llm_row_is_dropped` — bare "llm" bucket disappears - `test_embedding_row_is_preserved` — guard is scoped to LLM rows - `test_all_three_llm_reasons_coexist` — extraction/challenge/summarize Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

coderabbitai

🧹 Nitpick comments (2)

backend/usage_v2/tests/test_helper.py (2)

129-164: Consider adding a test case for llm_usage_reason=None.

The test uses an empty string "" for the unlabeled case (line 146), but the model definition (from usage_v2/models.py) has null=True, meaning llm_usage_reason can also be None in the database. While Python's not llm_reason handles both, explicitly testing the None case ensures the defensive guard works for both representations.

💡 Add test case for None value

 def test_unlabeled_llm_row_is_dropped() -> None:
     """An ``llm`` row with empty ``llm_usage_reason`` must not produce a
     bare ``"llm"`` bucket in the response — it should be silently
     dropped, while the legitimate extraction row is preserved.
     """
     _stub_rows(
         [
             _row(
                 usage_type="llm",
                 llm_reason="extraction",
                 sum_input=100,
                 sum_output=50,
                 sum_total=150,
                 sum_cost=0.05,
             ),
             _row(
                 usage_type="llm",
                 llm_reason="",  # the bug — no reason set
                 sum_cost=0.01,
             ),
+            _row(
+                usage_type="llm",
+                llm_reason=None,  # also test null case from DB
+                model_name="gpt-4o-mini",
+                sum_cost=0.01,
+            ),
         ]
     )

Note: You'll also need to update the _row function's type hint from llm_reason: str to llm_reason: str | None to support this.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/usage_v2/tests/test_helper.py` around lines 129 - 164, Add an
explicit None-case to the unlabeled LLM test and update the helper signature:
modify test_unlabeled_llm_row_is_dropped to include an additional stub row where
llm_reason is None (in addition to the empty-string case) so the code path
handling missing reasons is validated, and change the _row helper function's
type hint from llm_reason: str to llm_reason: str | None so it accepts None;
keep assertions against UsageHelper.get_usage_by_model unchanged to ensure the
unlabeled row is dropped and the extraction_llm entry remains.

33-70: Consider adding test cleanup for module stubs.

The sys.modules manipulation persists beyond the test module's lifetime. If other test modules in the same pytest session later import usage_v2.models expecting the real Django model, they'll get the stub instead. This might be acceptable given the stated constraints, but consider adding a cleanup mechanism or documenting this limitation.

💡 Optional: Add cleanup via pytest fixture or atexit

import atexit

_original_modules: dict[str, Any] = {}

def _install_stubs() -> tuple[Any, Any]:
    # ... existing code ...
    # Track originals for potential cleanup
    for mod_name in ["account_usage", "account_usage.models", "usage_v2.models"]:
        if mod_name not in _original_modules:
            _original_modules[mod_name] = sys.modules.get(mod_name)
    # ... rest of function ...

def _cleanup_stubs() -> None:
    for mod_name, original in _original_modules.items():
        if original is None:
            sys.modules.pop(mod_name, None)
        else:
            sys.modules[mod_name] = original

atexit.register(_cleanup_stubs)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/usage_v2/tests/test_helper.py` around lines 33 - 70, The test
installs module stubs in _install_stubs that persist in sys.modules and can leak
into other tests; add cleanup to restore originals by capturing originals for
keys ["account_usage","account_usage.models","usage_v2.models"] (e.g.
_original_modules) when installing and implement a _cleanup_stubs function that
restores or removes each entry from sys.modules, then register that cleanup
(either via atexit.register(_cleanup_stubs) or expose a pytest fixture that
calls _cleanup_stubs after tests) so UsageHelper/FakeUsage stubs do not leak
across the pytest session.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/usage_v2/tests/test_helper.py`:
- Around line 129-164: Add an explicit None-case to the unlabeled LLM test and
update the helper signature: modify test_unlabeled_llm_row_is_dropped to include
an additional stub row where llm_reason is None (in addition to the empty-string
case) so the code path handling missing reasons is validated, and change the
_row helper function's type hint from llm_reason: str to llm_reason: str | None
so it accepts None; keep assertions against UsageHelper.get_usage_by_model
unchanged to ensure the unlabeled row is dropped and the extraction_llm entry
remains.
- Around line 33-70: The test installs module stubs in _install_stubs that
persist in sys.modules and can leak into other tests; add cleanup to restore
originals by capturing originals for keys
["account_usage","account_usage.models","usage_v2.models"] (e.g.
_original_modules) when installing and implement a _cleanup_stubs function that
restores or removes each entry from sys.modules, then register that cleanup
(either via atexit.register(_cleanup_stubs) or expose a pytest fixture that
calls _cleanup_stubs after tests) so UsageHelper/FakeUsage stubs do not leak
across the pytest session.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a3c5d643-3287-413e-895a-aa27cb7d475e

📥 Commits

Reviewing files that changed from the base of the PR and between 0533ced and 095c7d1.

📒 Files selected for processing (3)

backend/usage_v2/helper.py
backend/usage_v2/tests/__init__.py
backend/usage_v2/tests/test_helper.py

- legacy_executor: extract _run_pipeline_answer_step helper to drop _handle_structure_pipeline cognitive complexity from 18 to under 15 - legacy_executor: bundle 9 prompt-run scalars into a prompt_run_args dict so _run_line_item_extraction has 8 params (was 15, limit 13) - legacy_executor: merge implicitly concatenated log string - structure_tool_task: extract _write_pipeline_outputs helper used by both _execute_structure_tool_impl and _run_agentic_extraction to remove the duplicated INFILE / COPY_TO_FOLDER write block (fixes the 6.1% duplication on new code) - test_context_retrieval_metrics: use pytest.approx for float compare, drop unused executor local, drop always-true if is_single_pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

workers/executor/executors/legacy_executor.py (1)

1310-1325: ⚠️ Potential issue | 🟡 Minor

Inconsistent EMAIL vs DATE null handling.

_convert_scalar_answer now correctly returns None for NA cases (used by DATE), but the EMAIL type handling at lines 1848-1861 does not use this helper and still assigns the raw answer when it's "NA" (line 1850: structured_output[prompt_name] = answer). This means EMAIL fields may contain the string "NA" while DATE fields get None.

Consider using _convert_scalar_answer for EMAIL as well for consistency:

🔧 Proposed fix for consistent EMAIL handling

         elif output_type == PSKeys.EMAIL:
-            if answer.lower() == "na":
-                structured_output[prompt_name] = answer
-            else:
-                email_prompt = (
-                    f"Extract the email from the following text:\n{answer}"
-                    f"\n\nOutput just the email. "
-                    f"The email should be directly assignable to a string "
-                    f"variable. No explanation is required. If you cannot "
-                    f'extract the email, output "NA".'
-                )
-                structured_output[prompt_name] = answer_prompt_svc.run_completion(
-                    llm=llm, prompt=email_prompt
-                )
+            email_prompt = (
+                f"Extract the email from the following text:\n{answer}"
+                f"\n\nOutput just the email. "
+                f"The email should be directly assignable to a string "
+                f"variable. No explanation is required. If you cannot "
+                f'extract the email, output "NA".'
+            )
+            structured_output[prompt_name] = LegacyExecutor._convert_scalar_answer(
+                answer, llm, answer_prompt_svc, email_prompt
+            )

Also applies to: 1848-1861

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@workers/executor/executors/legacy_executor.py` around lines 1310 - 1325, The
EMAIL handling code assigns the raw answer (which can be the string "NA")
directly to structured_output[prompt_name]; update that block to use the
existing helper _convert_scalar_answer(answer, llm, answer_prompt_svc, prompt)
so that "NA" is normalized to None like DATE does—call _convert_scalar_answer
with the same llm, answer_prompt_svc, and prompt variables used elsewhere and
assign its return value to structured_output[prompt_name] (allowing None to be
stored when extraction fails).

🧹 Nitpick comments (1)

workers/executor/executors/legacy_executor.py (1)
1947-1994: Redundant file read for metrics injection may add unnecessary I/O overhead.

This method re-reads the entire file solely to measure timing, then discards the content. For large files or remote storage (S3, GCS), this adds unnecessary I/O overhead and latency. Additionally, the measured time won't reflect what the cloud plugin actually experienced (different network conditions, caching, etc.).

Consider these alternatives:

Have the cloud single_pass_extraction plugin return context_retrieval timing in its metrics (ideal)

If the plugin cannot be modified, accept that these metrics are estimates or skip injection entirely

The broad Exception catch (Ruff BLE001) is acceptable here since this is best-effort metrics injection with proper logging on failure.
💡 Alternative: Skip re-read if not meaningful

If the timing measurement doesn't need to be accurate (just a rough estimate), consider caching the result or documenting that this is an approximation:
     def _inject_context_retrieval_metrics(
         self, result: ExecutionResult, context: ExecutionContext
     ) -> None:
-        """Inject ``context_retrieval`` timing into single-pass metrics.
+        """Inject approximate ``context_retrieval`` timing into single-pass metrics.

         The cloud single_pass_extraction plugin handles retrieval
         internally but does not report ``context_retrieval`` timing in
         its returned metrics.  This method replicates the file-read
         measurement from ``RetrievalService.retrieve_complete_context``
-        and injects it into ``result.data["metrics"]``.
+        and injects it into ``result.data["metrics"]``.
+
+        NOTE: This re-reads the file, so timing is an approximation and
+        adds I/O overhead. Consider updating the cloud plugin to report
+        this metric directly.
         """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@workers/executor/executors/legacy_executor.py` around lines 1947 - 1994, The
current _inject_context_retrieval_metrics method performs an extra fs.read via
FileUtils.get_fs_instance to time file reads, causing redundant I/O; remove the
file read and measurement logic and instead skip injecting a synthetic
"context_retrieval" metric when the plugin did not provide it: check
result.data[PSKeys.METRICS] and if no per-prompt "context_retrieval" exists,
simply return (optionally emit a debug log and do not mutate data), leaving real
timing to the cloud single_pass_extraction plugin or to a future opt-in
estimation flag; update the method to stop calling fs.read and remove timing
variables (start/elapsed) and the try/except block while preserving use of
PSKeys, result, and context for locating the code.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@workers/executor/executors/legacy_executor.py`:
- Around line 1761-1769: The except KeyError handler that raises
LegacyExecutorError should chain the original KeyError to preserve traceback: in
the except block where you call ExecutorRegistry.get("line_item") and currently
raise LegacyExecutorError, modify the raise to include "from e" (or the caught
exception variable) so the new LegacyExecutorError is raised with exception
chaining from the original KeyError; update the except clause to capture the
KeyError as a variable (e.g., except KeyError as e) and raise
LegacyExecutorError(...) from e to satisfy Ruff B904 and preserve the original
traceback.

---

Outside diff comments:
In `@workers/executor/executors/legacy_executor.py`:
- Around line 1310-1325: The EMAIL handling code assigns the raw answer (which
can be the string "NA") directly to structured_output[prompt_name]; update that
block to use the existing helper _convert_scalar_answer(answer, llm,
answer_prompt_svc, prompt) so that "NA" is normalized to None like DATE
does—call _convert_scalar_answer with the same llm, answer_prompt_svc, and
prompt variables used elsewhere and assign its return value to
structured_output[prompt_name] (allowing None to be stored when extraction
fails).

---

Nitpick comments:
In `@workers/executor/executors/legacy_executor.py`:
- Around line 1947-1994: The current _inject_context_retrieval_metrics method
performs an extra fs.read via FileUtils.get_fs_instance to time file reads,
causing redundant I/O; remove the file read and measurement logic and instead
skip injecting a synthetic "context_retrieval" metric when the plugin did not
provide it: check result.data[PSKeys.METRICS] and if no per-prompt
"context_retrieval" exists, simply return (optionally emit a debug log and do
not mutate data), leaving real timing to the cloud single_pass_extraction plugin
or to a future opt-in estimation flag; update the method to stop calling fs.read
and remove timing variables (start/elapsed) and the try/except block while
preserving use of PSKeys, result, and context for locating the code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3272e3e8-8a8c-4604-9165-38115004e547

📥 Commits

Reviewing files that changed from the base of the PR and between 095c7d1 and e35af2f.

📒 Files selected for processing (3)

workers/executor/executors/legacy_executor.py
workers/file_processing/structure_tool_task.py
workers/tests/test_context_retrieval_metrics.py

✅ Files skipped from review due to trivial changes (1)

workers/tests/test_context_retrieval_metrics.py

🚧 Files skipped from review as they are similar to previous changes (1)

workers/file_processing/structure_tool_task.py

workers/executor/executors/legacy_executor.py

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@workers/file_processing/structure_tool_task.py`:
- Around line 728-733: The except block logs metadata_path which is created only
inside the try, risking an UnboundLocalError; fix this by ensuring metadata_path
is defined before the try (e.g., initialize metadata_path = None or a sentinel
like "<unknown>" above the try) or use a safe fallback when logging (e.g., use
locals().get("metadata_path") or conditional formatting) so the logger.error
call in the except path (the call that currently references metadata_path and
exc_info=True) cannot raise a new exception.
- Around line 663-669: The code currently overwrites the original
input_file_path (logger.info + fs.json_dump for input_file_path) before writing
the copy in COPY_TO_FOLDER, which risks corrupting the original if the copy
write fails; change the operation order so you first create copy_to_folder (use
copy_to_folder, execution_data_dir), write the copy_output_path (fs.json_dump to
copy_output_path using stem and structured_output), verify that write succeeds,
and only then overwrite the original input_file_path (fs.json_dump for
input_file_path and associated logger.info); keep the same variables
(input_file_path, copy_output_path, stem, structured_output) so the change is
localized and atomic from the task's perspective.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4c36a1da-b7a1-4a94-bd2a-ee6bc310125c

📥 Commits

Reviewing files that changed from the base of the PR and between e35af2f and 7421f3b.

📒 Files selected for processing (1)

workers/file_processing/structure_tool_task.py

workers/file_processing/structure_tool_task.py

for more information, see https://pre-commit.ci

backend/usage_v2/tests/test_helper.py

backend/usage_v2/helper.py

workers/executor/executors/legacy_executor.py

workers/tests/test_sanity_phase3.py

chandrasekharan-zipstack · 2026-04-07T06:32:44Z

Review Notes

HIGH: Destination connector `RuntimeError` — batch pipeline impact

_handle_filesystem_destination and _handle_database_destination now raise RuntimeError when tool_execution_result is missing (previously logged a warning and continued silently).

This is the correct behavior, but in a batch pipeline processing N files, does the caller catch exceptions per-file? If not, one missing tool result crashes the entire batch. Please confirm the file-processing loop wraps each file in a try/except so the RuntimeError only fails the individual file, not the whole execution.

Files: workers/shared/workflow/destination_connector.py (lines ~658, ~703)

MEDIUM: Single-pass `chunk-size=0` is a behavior change

_run_pipeline_answer_step now forces chunk-size=0 and chunk-overlap=0 on all outputs when is_single_pass=True. This switches retrieval from vector DB to full-context.

This is likely correct (single-pass should read the whole document), but it's a silent behavior change for any existing workflow that had single-pass enabled with a non-zero chunk size. Worth confirming no existing workflows depend on chunked retrieval in single-pass mode — or if they do, that full-context produces equal or better results.

File: workers/executor/executors/legacy_executor.py — _run_pipeline_answer_step

BUG: Duplicate INFILE overwrite in `_write_pipeline_outputs`

The new _write_pipeline_outputs helper writes fs.json_dump(path=input_file_path, data=structured_output) twice — once before the COPY_TO_FOLDER write and once after it. The second write is a copy-paste leftover. Not harmful (idempotent), but wasteful I/O — especially on remote storage backends.

File: workers/file_processing/structure_tool_task.py — _write_pipeline_outputs

MEDIUM: `source_file_name` change — verify across all execution modes

Changed from os.path.basename(file_handler.source_file) to just file_name. The comment says source_file is always {file_execution_dir}/SOURCE (a fixed sentinel). Is this true for all execution modes (ETL, API deployment, scheduler)? If any mode sets source_file to the actual filename, this would regress — outputs in COPY_TO_FOLDER would collide.

File: workers/shared/workflow/execution/service.py (lines ~1074-1083)

NIT: Embedding `usage_kwargs` propagation

The _run_pipeline_index → _handle_index usage_kwargs fix is correct and important. One note: the usage_kwargs = usage_kwargs or {} default means callers that don't pass it get an empty dict rather than None — confirm the embedding adapter handles empty dict gracefully (no KeyError on missing run_id).

File: workers/executor/executors/legacy_executor.py — _run_pipeline_index

…ming Drop _inject_context_retrieval_metrics and its call site in _handle_single_pass_extraction. The helper was timing a second fs.read against a warm cache (the cloud plugin had already read the file to build its combined prompt) and reporting that under context_retrieval, which is a fabricated number, not a measurement. The cloud plugin is the source of the file read for single-pass and is responsible for populating context_retrieval in its returned metrics. Updated the docstring to spell out the contract. Also fix misleading "Completed prompt" streaming in the table and line-item extraction wrappers: the message was firing on both the success and failure branches, and on failure the user never saw the error (it only went to logger.error). Move the success-only message into the success branch and stream the error at LogLevel.ERROR on the failure branch. Fall back to "unknown error" when the plugin returns an empty result.error. Drop the now-orphan TestInjectContextRetrievalMetrics test class (six tests calling the deleted method) and update the module docstring. Surviving classes (TestSinglePassChunkSizeForcing, TestPipelineIndexUsageKwargsPropagation) cover unrelated invariants and are kept. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sonarqubecloud · 2026-04-07T09:10:00Z

Quality Gate failed

Failed conditions
5.7% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

github-actions · 2026-04-07T09:35:03Z

Frontend Lint Report (Biome)

✅ All checks passed! No linting or formatting issues found.

github-actions · 2026-04-07T09:36:08Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 178 passed, 0 failed (178 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

harini-venkataraman and others added 30 commits February 19, 2026 20:39

Execution backend - revamp

2da4907

async flow

41eeef8

Streaming progress to FE

f66dfb2

Removing multi hop in Prompt studio ide and structure tool

95c6592

Merge remote-tracking branch 'origin/main' into feat/execution-backend

44a2b3f

UN-3234 [FIX] Add beta tag to agentic prompt studio navigation item

2f4f2dc

Added executors for agentic prompt studio

d041201

Merge branch 'main' of github.com:Zipstack/unstract into feat/executi…

0a0cfb1

…on-backend

Merge branch 'main' of github.com:Zipstack/unstract into feat/executi…

a4e1fd7

…on-backend

Added executors for agentic prompt studio

ae77d6a

Added executors for agentic prompt studio

5c22956

Removed redundant envs

3cc3213

Removed redundant envs

d0532f8

Removed redundant envs

6173df5

[pre-commit.ci] auto fixes from pre-commit.com hooks

bbe6f58

for more information, see https://pre-commit.ci

Removed redundant envs

a3dc912

Merge branch 'main' of github.com:Zipstack/unstract into feat/executi…

98c8071

…on-backend

Merge branch 'feat/execution-backend' of github.com:Zipstack/unstract…

21157ac

… into feat/execution-backend

Removed redundant envs

0216b59

Removed redundant envs

db81b9d

Removed redundant envs

e1da202

Removed redundant envs

d119797

Removed redundant envs

fbadbf8

Removed redundant envs

882296e

Removed redundant envs

6d3bbbf

[pre-commit.ci] auto fixes from pre-commit.com hooks

292460b

for more information, see https://pre-commit.ci

Removed redundant envs

f35c0e6

Merge branch 'feat/execution-backend' of github.com:Zipstack/unstract…

9bcb458

… into feat/execution-backend

adding worker for callbacks

0cbd10a

harini-venkataraman and others added 2 commits April 6, 2026 19:49

[pre-commit.ci] auto fixes from pre-commit.com hooks

095c7d1

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

harini-venkataraman changed the title ~~Fix/agentic executor queue~~ UN-3266 [FIX] Async execution backend stabilization Apr 6, 2026

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

workers/executor/executors/legacy_executor.py Outdated Show resolved Hide resolved

[pre-commit.ci] auto fixes from pre-commit.com hooks

7421f3b

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

workers/file_processing/structure_tool_task.py Show resolved Hide resolved

workers/file_processing/structure_tool_task.py Show resolved Hide resolved

harini-venkataraman and others added 3 commits April 6, 2026 20:50

Addressing greptile comments

1a79030

Addressing greptile comments

5c3b67c

[pre-commit.ci] auto fixes from pre-commit.com hooks

adda29e

for more information, see https://pre-commit.ci

harini-venkataraman requested review from Deepak-Kesavan, chandrasekharan-zipstack and gaya3-zipstack April 6, 2026 15:25

harini-venkataraman self-assigned this Apr 6, 2026