Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions merge-queue/administration/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,41 @@ The time in queue can be displayed as different statistical measures. You can sh
| P95 | The value below 95% of the time in queue falls. |
| P99 | The value below 99% of the time in queue falls. |

### Testing duration

Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terminology clarification. The phrase the test cycle reaches a final state (merged, failed, or canceled) mixes terminology: elsewhere on this page the Conclusion count table (line 88) uses Pass / Failure / Cancel categories, where "Merged by Trunk" is a reason within Pass. Consider matching that vocabulary, e.g. …reaches a final state (passed, failed, or canceled), or align with the Cycle ended in filter values you describe on line 125 (Merged / Failed). Whichever you pick, the same three terms ideally show up in both places.


This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.

{% hint style="info" %}
Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once.
{% endhint %}

The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casing inconsistency. Testing Metrics is title-cased here, but the surrounding section names on this page are sentence case (e.g., Time in queue, Conclusion count, Drill down into metrics). If Testing Metrics is a literal UI label, consider quoting or pointing that out so it doesn't read like a style slip; otherwise lowercase to Testing metrics for consistency.


#### Filters

Two filters let you narrow the data:

* **Outcome** -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vague "and others". Options include Passed, Failed, and others. leaves the reader wondering what "others" are. If the dropdown has a fixed, enumerable set of options (e.g., Passed / Failed / Canceled / Timed out), list them. If the set is open-ended or evolving, say so explicitly (e.g., "and additional outcomes such as Canceled and Timed out") so the reader knows whether to expect to discover more.

* **Cycle ended in** -- Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set.
Comment on lines +110 to +125
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: em-dash consistency. The rest of this file uses real em-dashes () for parenthetical dashes (see lines 149, 151–154, 161, 173–175), but this new section uses double-hyphens (--) in lines 110, 115, 124, and 125. GitBook does not auto-convert -- to , so these will render as literal double-hyphens, which looks off next to the rest of the page.

Recommend swapping each -- for (em-dash). E.g.:

Suggested change
Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue -- from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).
This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.
{% hint style="info" %}
Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once.
{% endhint %}
The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls.
#### Filters
Two filters let you narrow the data:
* **Outcome** -- Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution.
* **Cycle ended in** -- Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set.
Testing duration shows how long each PR spends in the **TESTING** phase of the merge queue from when testing begins to when the test cycle reaches a final state (merged, failed, or canceled).
This is distinct from [Time in queue](#time-in-queue), which measures total time from queue entry to exit. A PR that waits before testing starts will have a longer time in queue but the same testing duration. Use this chart to understand CI performance specifically, separate from queue wait time.
{% hint style="info" %}
Each data point represents one TESTING-to-final-state transition. A PR that is kicked back to PENDING and re-enters testing (for example, due to a queue restart) can appear more than once.
{% endhint %}
The chart appears in a **Testing Metrics** section below the queue metrics charts and uses the same time range and granularity controls.
#### Filters
Two filters let you narrow the data:
* **Outcome** Filter by how the test cycle ended. Options include Passed, Failed, and others. Select **All Outcomes** to see the full distribution.
* **Cycle ended in** Filter by the final disposition of the PR. Select **Merged**, **Failed**, or **All Cycle Ended In** to see the full set.


Use these together to isolate, for example, only the testing durations of PRs that ultimately merged (outcome: Passed, cycle ended in: Merged), giving you a clean baseline for your CI speed without noise from canceled or failed runs.

#### Statistical measures

Testing duration displays the same statistical measures as Time in queue. Use the **+ Add** button to show or hide them.

| Measure | Explanation |
| ------- | ----------------------------------------------------------- |
| Average | Average testing duration during the time bucket |
| Minimum | The shortest testing duration in the time bucket |
| Maximum | The longest testing duration in the time bucket |
| Sum | The total of all testing durations added together |
| P50 | The value below which 50% of testing durations fall |
| P95 | The value below which 95% of testing durations fall |
| P99 | The value below which 99% of testing durations fall |
Comment on lines +133 to +141
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with the Time in queue table. Compared to the existing table at lines 98–106, this new table differs in two small ways:

  1. Trailing punctuation. The Time in queue rows end with periods; these rows do not.
  2. "Average" row wording. The Time in queue version reads Average of all time in queue during the time bucket, but here it's Average testing duration during the time bucket. The parallel phrasing reads better when the two tables sit on the same page.

Note the new percentile rows actually read more clearly than the existing ones (The value below which 50% of testing durations fall vs. the older The value below 50% of the time in queue falls., which is grammatically awkward). Up to you whether to (a) align to the existing style for now or (b) fix the older table to match the new, clearer wording in a follow-up.

Minimal alignment to existing style:

Suggested change
| Measure | Explanation |
| ------- | ----------------------------------------------------------- |
| Average | Average testing duration during the time bucket |
| Minimum | The shortest testing duration in the time bucket |
| Maximum | The longest testing duration in the time bucket |
| Sum | The total of all testing durations added together |
| P50 | The value below which 50% of testing durations fall |
| P95 | The value below which 95% of testing durations fall |
| P99 | The value below which 99% of testing durations fall |
| Measure | Explanation |
| ------- | --------------------------------------------------------- |
| Average | Average of all testing durations during the time bucket. |
| Minimum | The shortest testing duration in the time bucket. |
| Maximum | The longest testing duration in the time bucket. |
| Sum | The total of all testing durations added together. |
| P50 | The value below which 50% of testing durations fall. |
| P95 | The value below which 95% of testing durations fall. |
| P99 | The value below which 99% of testing durations fall. |


### Drill down into metrics

From the **Conclusion count** and **Time in queue** charts, you can drill into any point or window on the graph to see the exact pull requests that made up those numbers.
Expand Down
Loading