Skip to content

fix(jobs): pre-flight workspace check before gapfill/FBA; humanize errors#217

Merged
samseaver merged 3 commits into
ModelSEED:stagingfrom
VibhavSetlur:staging
Jun 5, 2026
Merged

fix(jobs): pre-flight workspace check before gapfill/FBA; humanize errors#217
samseaver merged 3 commits into
ModelSEED:stagingfrom
VibhavSetlur:staging

Conversation

@VibhavSetlur
Copy link
Copy Markdown
Collaborator

Why

Live Flower (poplar:5555) currently shows 543 failed modelseed.gapfill + modelseed.fba jobs all dying with WorkspaceError('_ERROR_Object not found!_ERROR_') — the same handful of broken model refs being retried 10–20 times each by frustrated users.

Task Failed Failure family
modelseed.gapfill 392 Object not found
modelseed.fba 151 Object not found

Every payload shape is identical: model = '/<user>/modelseed/<name>/model' where the workspace object truly does not exist. The frontend builds that ref purely from URL params (app/model/[...path]/page.tsx:1457-1474) and submits the job without verifying the object exists.

This complements José's backend change (exception text now reads No model found at '<path>'. Check that your reconstruct…) by:

  1. Preventing the doomed celery enqueue in the first place.
  2. Surfacing the same actionable wording for any pre-existing failed-job rows still showing the legacy _ERROR_ string in My Jobs.

What

  • lib/utils/jobErrors.ts — new formatJobError(raw, modelRef?) helper. Translates legacy _ERROR_Object not found!_ERROR_ (and bare Object not found) into actionable wording; passes the new backend message through unchanged; returns undefined for empty input.
  • app/model/[...path]/page.tsxsubmitModelJob now calls workspaceGet([modelRef]) before submitting; throws the friendly error if the object is missing; routes both pre-flight and backend errors through formatJobError. The doomed celery job is never enqueued.
  • app/(user-data)/my-jobs/page.tsx — the displayed errorMsg is run through formatJobError(job.error, args.model) so older failed-job rows show the friendly wording too, with the missing path substituted.
  • tests/unit/utils/jobErrors.test.ts — 7 new unit tests covering empty/null inputs, legacy form with/without ref, bare "Object not found", new backend message passthrough, unrelated errors untouched, Error-instance coercion.
  • tests/unit/api/biochem.test.ts — small defensive 7s race timeout for the existing network-dependent integration probe so the suite stays green when the live API blips on slow CI networks.

Verification

  • npm run lint — 0 errors (pre-existing warnings only, none in touched files)
  • npx tsc --noEmit — clean
  • npm run test:run98/98 pass (7 new for formatJobError)
  • npm run build/model/[...path], /team, /my-jobs all build successfully
  • npm audit --omit=dev --audit-level=high — 0 vulns
  • Playwright E2E vs deployed staging (authenticated as seaver): workspaceGet returns HTTP 404 + _ERROR_Object not found!_ERROR_ for a missing model — bug is real.
  • Playwright E2E vs local dev with this fix: full UI flow (auth, navigate to missing-model page, click Run FBA, confirm dialog) shows the friendly "No model found / reconstruct" wording and fires zero /api/jobs/{fba,gapfill} POSTs.

Test plan

  • All five CI steps green locally and in fork CI
  • Sam: validate live behaviour against a real missing-model ref in staging before merging to master and redeploying

🤖 Generated with Claude Code

VibhavSetlur and others added 3 commits June 3, 2026 13:21
…rors

When users navigate to a model ref whose backing workspace object is
missing (a reconstruct that never completed, or a stale/bookmarked link)
and click Run FBA / Run GapFilling, the frontend was enqueuing a celery
job that the backend could only fail with the cryptic PATRIC workspace
text `_ERROR_Object not found!_ERROR_`. Live Flower shows 543 of these
failed jobs across modelseed.gapfill (392) and modelseed.fba (151) —
many users retrying the same broken ref 10-20 times — and complements
the new clearer backend message José added.

This adds a single fix:

- New `lib/utils/jobErrors.ts#formatJobError` translates both the legacy
  `_ERROR_Object not found!_ERROR_` and the explicit "Object not found"
  substrings into actionable wording that names the missing path and
  points users at their reconstruct job. The new backend message is
  recognised and passed through unchanged.
- `app/model/[...path]/page.tsx#submitModelJob` now calls
  `workspaceGet([modelRef])` before submitting, throws the friendly
  error if the object is missing, and routes both the pre-flight and any
  backend rejection through `formatJobError`. The doomed celery job is
  never enqueued.
- `app/(user-data)/my-jobs/page.tsx` runs the displayed `errorMsg`
  through `formatJobError` so older failed-job rows show the new
  actionable wording too, substituting the job's own model ref.

Unit-tested with 7 new cases in `tests/unit/utils/jobErrors.test.ts`
(empty/null inputs, legacy form with/without ref, bare "Object not
found", new backend message passthrough, unrelated errors untouched,
Error-instance coercion). Local lint/typecheck/build clean, 98/98 unit
tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Biochem Integration Tests suite intentionally degrades to "skipped"
when staging.modelseed.org is unreachable, but the live probe in beforeAll
had no internal timeout — on slow CI networks the vitest hookTimeout
(10s) fired before the catch block could mark isApiAvailable = false,
turning the whole suite (and CI) red. Race the probe against an explicit
7s timer so the catch path always runs first; observed master CI flake
matched this same signature on 2026-06-03.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-before-job

fix(jobs): pre-flight workspace check before gapfill/FBA; humanize errors
@samseaver samseaver merged commit 837d435 into ModelSEED:staging Jun 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants