Skip to content

chore: ratchet test262 baseline after async/promise cluster fixes#469

Merged
dowdiness merged 1 commit into
mainfrom
chore/update-test262-baseline
Jun 26, 2026
Merged

chore: ratchet test262 baseline after async/promise cluster fixes#469
dowdiness merged 1 commit into
mainfrom
chore/update-test262-baseline

Conversation

@dowdiness

Copy link
Copy Markdown
Owner

Summary

Test plan

  • Numbers verified from CI run 28219417792 (test262-summary.json combined report)
  • Buffer is ~0.3% below actuals — blocks regressions, allows for minor flakiness

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@dowdiness, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 19 minutes and 51 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 01212a7c-a07b-4c78-8c6d-e8beca8d6953

📥 Commits

Reviewing files that changed from the base of the PR and between e79255c and 4647c01.

📒 Files selected for processing (1)
  • test262-baseline.json
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/update-test262-baseline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d5b5b4b44

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread test262-baseline.json
},
"combined": {
"passed_min": 45990
"passed_min": 53550

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Raise the combined baseline above the per-mode floor

With these new per-mode minima, any run that passes both mode gates must have at least 27,650 + 26,900 = 54,550 passes, but this combined threshold is 53,550. The regression workflow checks the mode thresholds first and then compares total_passed to baseline["combined"]["passed_min"] (.github/workflows/test262.yml lines 489-514), so the combined gate can no longer fail independently and will report a +1,000 combined delta even when both modes only meet their minima. If the combined row is intended to guard aggregate regressions, set it to at least 54,550 or remove the redundant combined baseline.

Useful? React with 👍 / 👎.

Non-strict: 23520 → 27650 (+4130), Strict: 22450 → 26900 (+4450),
Combined: 45990 → 53550 (+7560). Reflects fixes from PRs #468, #466,
#467, #463, #462 that together recovered ~7700 test262 cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dowdiness dowdiness force-pushed the chore/update-test262-baseline branch from 4d5b5b4 to 4647c01 Compare June 26, 2026 06:53
@dowdiness dowdiness merged commit eced381 into main Jun 26, 2026
13 checks passed
@dowdiness dowdiness deleted the chore/update-test262-baseline branch June 26, 2026 07:12
dowdiness added a commit that referenced this pull request Jun 26, 2026
- scripts/set-baseline.py: reads passed counts from the latest successful
  main-branch CI run (never PR-branch) and subtracts a configurable buffer
  (default 100) to compute passed_min. Replaces the error-prone manual step
  that caused the baseline miscalibration in PR #469 (strict set to 26,900
  based on a PR-branch outlier; main showed ~25,977).
- AGENTS.md: two new Test262 Tool Boundaries rules — use set-baseline.py
  for baseline updates; verify CI claims in compacted summaries from the
  raw log before investigating.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dowdiness added a commit that referenced this pull request Jun 26, 2026
* chore: add set-baseline.py + AGENTS.md rules for baseline calibration

- scripts/set-baseline.py: reads passed counts from the latest successful
  main-branch CI run (never PR-branch) and subtracts a configurable buffer
  (default 100) to compute passed_min. Replaces the error-prone manual step
  that caused the baseline miscalibration in PR #469 (strict set to 26,900
  based on a PR-branch outlier; main showed ~25,977).
- AGENTS.md: two new Test262 Tool Boundaries rules — use set-baseline.py
  for baseline updates; verify CI claims in compacted summaries from the
  raw log before investigating.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: fix set-baseline.py portability and AGENTS.md trigger condition

- Derive repo from gh repo view instead of hardcoding dowdiness/js_engine
- Remove --pattern flag from gh run download (broke file lookup on some runs)
- Add "when to run" condition to AGENTS.md rule (after batches recovering >100 tests)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant