ci: parallelize merge-tree mocha tests (skip in perf mode)#26657
ci: parallelize merge-tree mocha tests (skip in perf mode)#26657frankmueller-msft wants to merge 2 commits intomicrosoft:mainfrom
Conversation
Re-applies the parallelization from microsoft#26624 which was reverted in microsoft#26653 because it broke the Performance Benchmarks pipeline. The fix: make parallel execution conditional on not running in perf mode. Mocha serializes Hook objects across worker processes in parallel mode, which strips methods like hook.error() that the benchmark reporter relies on. Perf tests also need sequential execution for measurement quality. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Re-introduces Mocha parallelization for merge-tree tests in CI while disabling it for perf mode to avoid benchmark reporter issues, and adjusts farm test timeouts for parallel execution stability.
Changes:
- Conditionally enables Mocha
parallelmode withjobs: 4when not running with--perfMode. - Increases timeouts for
applyStashedOpFarmandreconnectFarmtest suites.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| packages/dds/merge-tree/.mocharc.cjs | Enables conditional Mocha parallelization and sets a fixed job count for CI. |
| packages/dds/merge-tree/src/test/client.applyStashedOpFarm.spec.ts | Increases suite timeout to accommodate parallel execution overhead. |
| packages/dds/merge-tree/src/test/client.reconnectFarm.spec.ts | Increases suite timeout to accommodate parallel execution overhead. |
| // Matches the timeout convention used by other farm tests (conflictFarm, obliterateFarm, | ||
| // rollbackFarm). Tests complete in ~15-20s normally, but need headroom under parallel execution. | ||
| }).timeout(30 * 10000); |
There was a problem hiding this comment.
30 * 10000 sets the timeout to 300,000ms (5 minutes), which doesn’t align with the comment’s stated need for modest headroom (tests ~15–20s). This looks like a multiplier typo (likely meant 30 * 1000, or another explicit ms value). Consider setting an intentional timeout (e.g., 60 * 1000) and/or extracting a named constant to avoid unit mistakes.
| // Matches the timeout convention used by other farm tests (conflictFarm, obliterateFarm, | ||
| // rollbackFarm). Tests complete in ~15-20s normally, but need headroom under parallel execution. | ||
| }).timeout(30 * 10000); |
There was a problem hiding this comment.
30 * 10000 sets a 5 minute timeout, which is much larger than the described headroom and likely unintentional. Use an explicit intended value in milliseconds (and consider a shared constant with other farm tests) to prevent silent CI slowdowns and masking hung tests.
| // Parallelize tests to speed up CI, but not in perf mode where parallel execution | ||
| // breaks the benchmark reporter and degrades measurement quality. | ||
| if (!process.argv.includes("--perfMode")) { | ||
| config.parallel = true; | ||
| // Keep job count limited to avoid excessive memory use and thread overhead in CI. | ||
| // 4 jobs was measured to give most of the speed up while not slowing down CI. | ||
| config.jobs = 4; | ||
| } |
There was a problem hiding this comment.
process.argv.includes(\"--perfMode\") only matches the exact token; it won’t detect common variants like --perfMode=true / --perfMode=1. If those forms can occur in your pipelines/scripts, parallel mode would be enabled unexpectedly. Consider checking for any arg that equals --perfMode or starts with --perfMode= (or using a small arg parser) to make the gating robust.
|
Addressing Copilot's review:
|
| config.parallel = true; | ||
| // Keep job count limited to avoid excessive memory use and thread overhead in CI. | ||
| // 4 jobs was measured to give most of the speed up while not slowing down CI. | ||
| config.jobs = 4; |
There was a problem hiding this comment.
On the CI test run, there was a timeout of an unrelated test: https://dev.azure.com/fluidframework/public/_build/results?buildId=382483&view=ms.vss-test-web.build-test-results-tab&runId=7473433&resultId=100000&paneView=debug
Its possible you just got unlucky, but this suggests we might be increasing the risk of a timeout by overloading the CPU.
Since with this change the unit tests on CI are no longer the long poll, we are getting more speed up that we need at the moment, so it might be safer to go with a smaller thread count here and still get all or most of the benefit.
Maybe try 2 for a conservative starting point, then run the CI test pipeline a few times and see if it runs without any issues.
Failing tests indicate this likely makes test run more flakey.
Summary
Re-applies the parallelization from #26624 (reverted in #26653) with a fix for the Performance Benchmarks pipeline failure.
parallel: truewithjobs: 4for merge-tree mocha tests, except when running in perf mode (--perfMode)hook.error()), which the benchmark reporter depends on. Sequential execution is also needed for measurement quality.applyStashedOpFarmandreconnectFarmto match other farm testsChanges
packages/dds/merge-tree/.mocharc.cjs— conditional parallel configpackages/dds/merge-tree/src/test/client.applyStashedOpFarm.spec.ts— timeout bumppackages/dds/merge-tree/src/test/client.reconnectFarm.spec.ts— timeout bumpVerification
Without fix (parallel enabled unconditionally) — perf tests fail with the reported error:
With fix (
--perfModeguard) — both pass:npm run perf— all benchmarks pass (sequential mode, benchmark reporter works)npm run test:mocha— 1555 passing, 2 pending (parallel mode, ~2 min)Test plan
hook.error is not a functionfailure without the fixnpm run perfpasses with the fix (sequential mode)npm run test:mochapasses with the fix (parallel mode)🤖 Generated with Claude Code