Skip to content

feat(generation): parallelize within-scene TTS generation#696

Open
ly-wang19 wants to merge 1 commit into
THU-MAIC:mainfrom
ly-wang19:feat/parallel-tts-within-scene
Open

feat(generation): parallelize within-scene TTS generation#696
ly-wang19 wants to merge 1 commit into
THU-MAIC:mainfrom
ly-wang19:feat/parallel-tts-within-scene

Conversation

@ly-wang19

Copy link
Copy Markdown
Contributor

What & why

Follow-up to #660 (just merged — opt-in parallel scene content). The remaining serial cost in classroom generation is TTS: generateTTSForScene rendered a scene’s speech clips one at a time in a for … await loop.

Why it’s safe: unlike cross-scene parallelism (ruled out in #572 because it breaks previousSpeeches threading), the speech actions within a scene are independent — each writes its own audio under its own audioId (tts_s<order>_<actionId>), stored separately, with no ordering/carry-over. Playback order comes from the action list, not generation order.

Closes #695.

Change

When the server opts into parallel generation (PARALLEL_SCENE_CONCURRENCY > 1, the #660 knob), render the scene’s speech clips with bounded concurrency via the mapWithConcurrency helper #660 added; otherwise the original strictly-serial for … await loop runs.

  • Default (0/unset) is byte-for-byte the original serial behaviour.
  • The bound keeps TTS providers’ per-key 429 quotas in check — same safety stance as feat(generation): opt-in parallel scene-content generation #660.
  • Per-clip failures are still counted (not thrown), so one bad clip never aborts the scene — unchanged from before.

Reuses the existing parallelSceneConcurrency setting (no new env var / plumbing). A dedicated PARALLEL_TTS_CONCURRENCY could be a later refinement if TTS vs LLM quota profiles need to diverge — noted in the issue.

Test plan

npx vitest run94 files / 742 tests pass; tsc/prettier/eslint clean. The concurrency mechanism itself is already unit-tested (tests/utils/concurrency.test.ts, incl. the cap); the serial default path is unchanged. (The hook isn’t unit-tested — no DOM/hook harness in the repo — same as #660.)

No user-facing strings (no i18n impact).

Follow-up to THU-MAIC#660. generateTTSForScene rendered a scene's speech clips
one at a time. Within a scene the speech actions are independent — each
writes its own audio under its own audioId, no ordering/carry-over — so
when the server opts into parallel generation (PARALLEL_SCENE_CONCURRENCY
> 1, the THU-MAIC#660 knob) render them with bounded concurrency via
mapWithConcurrency. Default (0/unset) is byte-for-byte the original serial
loop; the bound respects TTS providers' 429 quotas. Per-clip failures are
still counted (not thrown), so one bad clip never aborts the scene.

Closes THU-MAIC#695
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallelize TTS generation within a scene (follow-up to #660)

1 participant