feat(core): multi-sample runs + variance, folded into confidence by MerlijnW70 · Pull Request #7 · MerlijnW70/cu-profiler

MerlijnW70 · 2026-06-21T13:14:34Z

Implements multi-sample runs + variance (roadmap item 2), built on top of #6's groundwork only conceptually — independent change.

What

Wires the long-reserved Scenario.samples field (plus a --samples CLI override) so a scenario can be measured N times. The per-sample total_cu values form a SampleStats distribution (count / min / median / max / variance / std-dev / CV) attached to the measurement, and the coefficient of variation folds into the confidence score — implementing the spec §12 "sample variance" factor that was listed-but-unimplemented.

Honestly gated (no fake precision)

The shipping CLI uses the deterministic recorded backend, where running N times yields byte-identical results. New ExecutionBackend::is_deterministic() (true for recorded) makes the profiler run a deterministic backend exactly once — it never fabricates a run-to-run spread it did not observe. Multi-sampling therefore activates only for non-deterministic execution backends (e.g. Mollusk, once driven from the CLI). This is the design subtlety the research flagged, handled explicitly.

Details

SampleStats is Option + skip_serializing_if, so existing single-sample JSON reports are byte-identical (no golden churn).
Confidence demotion: CV ≥2% → Medium, ≥10% → Low, each with a reason.
samples is now settable from [scenario.<name>] samples = N and overridable with --samples.
Pure stats (SampleStats::from_samples) and the confidence fold are unit-tested; a VaryingBackend proves end-to-end sampling + variance + demotion, and a test asserts the recorded backend ignores samples.

Docs

reference §12 (variance factor) and §15 (--samples now real — fixes that doc drift); CHANGELOG.

Local gate: grade A (core 67 tests).

🤖 Generated with Claude Code

Wire the long-reserved `Scenario.samples` field (plus a `--samples` override) so a scenario can be measured N times on a non-deterministic backend. The per-sample `total_cu` values become a `SampleStats` distribution (count/min/median/max/variance/std-dev/CV) on the measurement, and the coefficient of variation folds into the confidence score (CV >=2% -> Medium, >=10% -> Low) — implementing the spec §12 "sample variance" factor. Crucially this is honestly gated: `ExecutionBackend::is_deterministic()` (true for the recorded backend) makes the profiler run a deterministic backend exactly once, so it never fabricates a run-to-run spread it did not observe. The field is now settable from `[scenario.x] samples` and overridable with `--samples`. `SampleStats` serializes only when present (Option, skip-if-none), so existing single-sample JSON reports are byte-identical. Docs: reference §12/§15 updated; CHANGELOG. Gate: grade A. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

MerlijnW70 force-pushed the feat/multi-sample-variance branch from cfc9da8 to 16dd8a0 Compare June 21, 2026 14:13

MerlijnW70 merged commit 6a581e3 into main Jun 21, 2026
4 checks passed

MerlijnW70 deleted the feat/multi-sample-variance branch June 21, 2026 14:14

MerlijnW70 mentioned this pull request Jun 21, 2026

chore(release): v0.2.0 #11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): multi-sample runs + variance, folded into confidence#7

feat(core): multi-sample runs + variance, folded into confidence#7
MerlijnW70 merged 1 commit into
mainfrom
feat/multi-sample-variance

MerlijnW70 commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MerlijnW70 commented Jun 21, 2026

What

Honestly gated (no fake precision)

Details

Docs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants