Performance: Optimize array iterations in UtteranceBasedMerger#281
Closed
ysdede wants to merge 1 commit into
Closed
Performance: Optimize array iterations in UtteranceBasedMerger#281ysdede wants to merge 1 commit into
ysdede wants to merge 1 commit into
Conversation
Replaced chained array methods (.map().filter().map()) in `normalizeWords` with a single `for` loop to prevent intermediate array allocations and reduce GC churn. Replaced `Math.max(...array.map(...))` calls for finding the maximum `end_time` with a manual `for` loop to eliminate array spreading limitations and intermediate array allocations.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Owner
Author
|
Closing as superseded by I landed the narrower
Verification for the landed change was:
The only remaining failures are the pre-existing |
This comment has been minimized.
This comment has been minimized.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Replaced chained array methods (
.map().filter().map()) in thenormalizeWordsmethod with a single manualforloop. Additionally, replacedMath.max(...array.map(w => w.end_time))calls incheckPendingSentenceBoundaryandflushPendingWords(lines 478 and 519) with a manualforloop that iterates to find the maximumend_time.Why
These code blocks are located in the hot path of the text processing pipeline (
UtteranceBasedMerger.ts) which frequently normalizes incoming sets of words and evaluates sentence boundaries. Chained array methods allocate temporary arrays that require subsequent garbage collection, contributing to unnecessary GC churn. Furthermore, using the spread operator (...) combined withmap()to calculate the max value can be slower and introduces a limit on array sizes, risking stack overflow for extremely large arrays.Benchmark metrics:
In local tests measuring
normalizeWordsagainst an array of 100 words:In local tests measuring
getPendingEndagainst an array of 50 words:How to verify
bun test src/lib/transcription/UtteranceBasedMerger.test.tsto ensure the specificUtteranceBasedMergertests pass.bun test srcto run the full test suite and confirm no regressions exist.npm i -g typescript && tsc --noEmitto confirm no new TypeScript errors were introduced.PR created automatically by Jules for task 1960043804257949492 started by @ysdede
Summary by Sourcery
Optimize hot-path word normalization and sentence boundary processing for better runtime performance in UtteranceBasedMerger.
Enhancements: