chore: ratchet test262 baseline after async/promise cluster fixes by dowdiness · Pull Request #469 · dowdiness/js_engine

dowdiness · 2026-06-26T06:28:26Z

Summary

Raises passed_min floors to reflect test262 improvements from PRs fix(array): respect deletion of Array.prototype[Symbol.iterator] #462–fix(async): parameter TDZ, sloppy this, arrow arguments, mapped arguments #468 (async/Promise cluster fixes)
Non-strict: 23520 → 27650 (actual: 27727, +4207 from PRs)
Strict: 22450 → 26900 (actual: 26971, +3521 from PRs)
Combined: 45990 → 53550 (actual: 53698, +7708 from PRs)
Buffer is ~150 combined to allow for test runner variance while blocking any meaningful regression

Test plan

Numbers verified from CI run 28219417792 (test262-summary.json combined report)
Buffer is ~0.3% below actuals — blocks regressions, allows for minor flakiness

🤖 Generated with Claude Code

coderabbitai · 2026-06-26T06:30:17Z

Warning

Review limit reached

@dowdiness, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 19 minutes and 51 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 01212a7c-a07b-4c78-8c6d-e8beca8d6953

📥 Commits

Reviewing files that changed from the base of the PR and between e79255c and 4647c01.

📒 Files selected for processing (1)

test262-baseline.json

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/update-test262-baseline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d5b5b4b44

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-26T06:32:40Z

  },
  "combined": {
-    "passed_min": 45990
+    "passed_min": 53550


Raise the combined baseline above the per-mode floor

With these new per-mode minima, any run that passes both mode gates must have at least 27,650 + 26,900 = 54,550 passes, but this combined threshold is 53,550. The regression workflow checks the mode thresholds first and then compares total_passed to baseline["combined"]["passed_min"] (.github/workflows/test262.yml lines 489-514), so the combined gate can no longer fail independently and will report a +1,000 combined delta even when both modes only meet their minima. If the combined row is intended to guard aggregate regressions, set it to at least 54,550 or remove the redundant combined baseline.

Useful? React with 👍 / 👎.

Non-strict: 23520 → 27650 (+4130), Strict: 22450 → 26900 (+4450), Combined: 45990 → 53550 (+7560). Reflects fixes from PRs #468, #466, #467, #463, #462 that together recovered ~7700 test262 cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- scripts/set-baseline.py: reads passed counts from the latest successful main-branch CI run (never PR-branch) and subtracts a configurable buffer (default 100) to compute passed_min. Replaces the error-prone manual step that caused the baseline miscalibration in PR #469 (strict set to 26,900 based on a PR-branch outlier; main showed ~25,977). - AGENTS.md: two new Test262 Tool Boundaries rules — use set-baseline.py for baseline updates; verify CI claims in compacted summaries from the raw log before investigating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: add set-baseline.py + AGENTS.md rules for baseline calibration - scripts/set-baseline.py: reads passed counts from the latest successful main-branch CI run (never PR-branch) and subtracts a configurable buffer (default 100) to compute passed_min. Replaces the error-prone manual step that caused the baseline miscalibration in PR #469 (strict set to 26,900 based on a PR-branch outlier; main showed ~25,977). - AGENTS.md: two new Test262 Tool Boundaries rules — use set-baseline.py for baseline updates; verify CI claims in compacted summaries from the raw log before investigating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore: fix set-baseline.py portability and AGENTS.md trigger condition - Derive repo from gh repo view instead of hardcoding dowdiness/js_engine - Remove --pattern flag from gh run download (broke file lookup on some runs) - Add "when to run" condition to AGENTS.md rule (after batches recovering >100 tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Jun 26, 2026

View reviewed changes

dowdiness force-pushed the chore/update-test262-baseline branch from 4d5b5b4 to 4647c01 Compare June 26, 2026 06:53

dowdiness merged commit eced381 into main Jun 26, 2026
13 checks passed

dowdiness deleted the chore/update-test262-baseline branch June 26, 2026 07:12

dowdiness mentioned this pull request Jun 26, 2026

chore: set-baseline.py script + AGENTS.md calibration rules #472

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: ratchet test262 baseline after async/promise cluster fixes#469

chore: ratchet test262 baseline after async/promise cluster fixes#469
dowdiness merged 1 commit into
mainfrom
chore/update-test262-baseline

dowdiness commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Review limit reached

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dowdiness commented Jun 26, 2026

Summary

Test plan

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading