Skip to content

tests: deflake TestWatchFailed#10943

Merged
ti-chi-bot[bot] merged 1 commit into
tikv:masterfrom
rleungx:fix-10627-testwatchfailed-flake
Jun 29, 2026
Merged

tests: deflake TestWatchFailed#10943
ti-chi-bot[bot] merged 1 commit into
tikv:masterfrom
rleungx:fix-10627-testwatchfailed-flake

Conversation

@rleungx

@rleungx rleungx commented Jun 25, 2026

Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: Close #10627

TestTSOKeyspaceGroupManagerSuite/TestWatchFailed is flaky. The test waits until a keyspace group has a serving primary, but then directly queries specific TSO node addresses. A queried TSO node may not have loaded the updated keyspace group metadata yet, so FindGroupByKeyspaceID can still return the default keyspace group.

What is changed and how does it work?

tests: deflake TestWatchFailed

Wait for the specific TSO nodes queried by TestWatchFailed to load the expected keyspace group metadata before asserting the FindGroupByKeyspaceID response.

Check List

Tests

  • Integration test
    • make -C tests/integrations gotest GOTEST_ARGS='-tags without_dashboard ./mcs/tso -run TestTSOKeyspaceGroupManagerSuite/TestWatchFailed -count=1'

Code changes

  • None

Side effects

  • None

Related changes

  • None

Release note

None.

Summary by CodeRabbit

  • Tests
    • Improved integration test coverage for keyspace-group loading and watch behavior.
    • Added more precise server-level waiting and verification to better detect readiness and revision updates.
    • Strengthened checks around group visibility after splits to reduce flaky test outcomes.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 25, 2026
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@rleungx, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 1 minute and 37 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6ccc7c79-5b78-4db3-bd52-a7da562eb38a

📥 Commits

Reviewing files that changed from the base of the PR and between 4550084 and 40235e4.

📒 Files selected for processing (1)
  • tests/integrations/mcs/tso/keyspace_group_manager_test.go
📝 Walkthrough

Walkthrough

TestWatchFailed now queries TSO nodes through a shared per-node FindGroupByKeyspaceID helper, tracks modRevision across lookups, and waits for keyspace groups to load on specific servers before validating split behavior.

Changes

TSO keyspace-group readiness test

Layer / File(s) Summary
Per-node lookup and readiness waits
tests/integrations/mcs/tso/keyspace_group_manager_test.go
Adds a shared per-node FindGroupByKeyspaceID helper, reuses it in checkFn, tracks modRevision, and waits for the initial and split keyspace groups to load on selected TSO servers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

lgtm, approved, ok-to-test

Suggested reviewers

  • lhy1024
  • okJiang

Poem

🐇 I hopped through nodes both near and far,
To watch each group where TSO stars are.
A revision ticked, then stayed in view,
While loaded groups said, “Yes, we’re true!”
✨ Hop-hop, the split test sings anew.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately describes the main change: deflaking TestWatchFailed in tests.
Description check ✅ Passed All required template sections are present, including issue number, change summary, checklist, and release note.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@rleungx rleungx force-pushed the fix-10627-testwatchfailed-flake branch from 6c8a3dd to 4550084 Compare June 25, 2026 08:48
@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/needs-linked-issue do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 25, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integrations/mcs/tso/keyspace_group_manager_test.go`:
- Around line 191-194: The stale-revision assertion path in
findGroupByKeyspaceID handling can panic because it dereferences resp before
confirming the RPC succeeded. Update the error branch to guard both err and resp
before accessing resp.GetHeader().GetError(), and assert the transport error
separately from the expected header error so the test handles transient RPC
failures without crashing.
- Around line 150-165: The RPC helper in findGroupByKeyspaceID is using
suite.ctx directly, which leaves each FindGroupByKeyspaceID call unbounded
during polling. Update this helper to create a per-call context with timeout (or
deadline) before invoking tsopb.NewTSOClient(conn).FindGroupByKeyspaceID, and
ensure the cancel function is deferred so each retry attempt is independently
bounded. Keep the change localized to findGroupByKeyspaceID and preserve the
existing request construction and service-discovery connection logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96f8c504-8e12-4522-8f3f-619feae492de

📥 Commits

Reviewing files that changed from the base of the PR and between 2e12f48 and 4550084.

📒 Files selected for processing (1)
  • tests/integrations/mcs/tso/keyspace_group_manager_test.go

Comment thread tests/integrations/mcs/tso/keyspace_group_manager_test.go
Comment thread tests/integrations/mcs/tso/keyspace_group_manager_test.go
Signed-off-by: Ryan Leung <rleungx@gmail.com>
@rleungx rleungx force-pushed the fix-10627-testwatchfailed-flake branch from 4550084 to 40235e4 Compare June 25, 2026 09:15
@rleungx rleungx requested review from bufferflies and lhy1024 June 25, 2026 09:19
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.18%. Comparing base (4335bc3) to head (40235e4).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10943      +/-   ##
==========================================
- Coverage   79.26%   79.18%   -0.09%     
==========================================
  Files         540      541       +1     
  Lines       74954    75035      +81     
==========================================
+ Hits        59412    59414       +2     
- Misses      11371    11425      +54     
- Partials     4171     4196      +25     
Flag Coverage Δ
unittests 79.18% <ø> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rleungx

rleungx commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

/retest

2 similar comments
@rleungx

rleungx commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

/retest

@rleungx

rleungx commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

/retest

@ti-chi-bot ti-chi-bot Bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jun 26, 2026
@ti-chi-bot ti-chi-bot Bot added the lgtm label Jun 29, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bufferflies, lhy1024

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [bufferflies,lhy1024]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 29, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

[LGTM Timeline notifier]

Timeline:

  • 2026-06-26 03:10:51.894432001 +0000 UTC m=+262735.557657514: ☑️ agreed by lhy1024.
  • 2026-06-29 07:26:10.147102284 +0000 UTC m=+24311.847481697: ☑️ agreed by bufferflies.

@rleungx

rleungx commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

/retest

1 similar comment
@rleungx

rleungx commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

/retest

@ti-chi-bot ti-chi-bot Bot merged commit 0d6dafc into tikv:master Jun 29, 2026
32 checks passed
@rleungx rleungx deleted the fix-10627-testwatchfailed-flake branch June 29, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TestWatchFailed is flaky

3 participants