feat(generation): parallelize within-scene TTS generation by ly-wang19 · Pull Request #696 · THU-MAIC/OpenMAIC

ly-wang19 · 2026-06-08T12:02:13Z

What & why

Follow-up to #660 (just merged — opt-in parallel scene content). The remaining serial cost in classroom generation is TTS: generateTTSForScene rendered a scene’s speech clips one at a time in a for … await loop.

Why it’s safe: unlike cross-scene parallelism (ruled out in #572 because it breaks previousSpeeches threading), the speech actions within a scene are independent — each writes its own audio under its own audioId (tts_s<order>_<actionId>), stored separately, with no ordering/carry-over. Playback order comes from the action list, not generation order.

Closes #695.

Change

When the server opts into parallel generation (PARALLEL_SCENE_CONCURRENCY > 1, the #660 knob), render the scene’s speech clips with bounded concurrency via the mapWithConcurrency helper #660 added; otherwise the original strictly-serial for … await loop runs.

Default (0/unset) is byte-for-byte the original serial behaviour.
The bound keeps TTS providers’ per-key 429 quotas in check — same safety stance as feat(generation): opt-in parallel scene-content generation #660.
Per-clip failures are still counted (not thrown), so one bad clip never aborts the scene — unchanged from before.

Reuses the existing parallelSceneConcurrency setting (no new env var / plumbing). A dedicated PARALLEL_TTS_CONCURRENCY could be a later refinement if TTS vs LLM quota profiles need to diverge — noted in the issue.

Test plan

npx vitest run — 94 files / 742 tests pass; tsc/prettier/eslint clean. The concurrency mechanism itself is already unit-tested (tests/utils/concurrency.test.ts, incl. the cap); the serial default path is unchanged. (The hook isn’t unit-tested — no DOM/hook harness in the repo — same as #660.)

No user-facing strings (no i18n impact).

Follow-up to THU-MAIC#660. generateTTSForScene rendered a scene's speech clips one at a time. Within a scene the speech actions are independent — each writes its own audio under its own audioId, no ordering/carry-over — so when the server opts into parallel generation (PARALLEL_SCENE_CONCURRENCY > 1, the THU-MAIC#660 knob) render them with bounded concurrency via mapWithConcurrency. Default (0/unset) is byte-for-byte the original serial loop; the bound respects TTS providers' 429 quotas. Per-clip failures are still counted (not thrown), so one bad clip never aborts the scene. Closes THU-MAIC#695

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(generation): parallelize within-scene TTS generation#696

feat(generation): parallelize within-scene TTS generation#696
ly-wang19 wants to merge 1 commit into
THU-MAIC:mainfrom
ly-wang19:feat/parallel-tts-within-scene

ly-wang19 commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ly-wang19 commented Jun 8, 2026

What & why

Change

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant