Canonical reference for the design and scope of
cu-profiler. This document is the source of truth for what we are building and why. Implementation details live in the code; this document defines the contract.
cu-profiler is a 100% Rust library and CLI tool for Solana programs that
measures, explains, compares, and enforces compute-unit (CU) consumption in CI.
It is not a simple log parser. It is a Solana-native compute intelligence toolkit.
It helps Solana developers answer:
- How many compute units does my program use?
- Which instruction is the most expensive?
- Which CPIs cause the most compute?
- Which scope/module/function caused a regression?
- Which scenarios sit close to their CU budget?
- Which PR increases CU consumption?
- Should CI fail because compute is too high?
- Is the measurement reliable enough to act on?
- Where is the biggest optimization opportunity?
End goal: cu-profiler becomes the standard compute regression suite for Solana programs.
Think gas snapshots for Solana, but broader: scenarios, CPI call trees, scope markers, baselines, budget policies, JSON/JUnit/Markdown output, and CI exit codes.
The product as a whole is:
Solana compute observability
+ regression testing
+ budget enforcement
+ scenario intelligence
+ CPI attribution
+ CI-native reporting
Built entirely in Rust. No TypeScript, JavaScript, Python, shell-heavy architecture, or external runtime dependency for core functionality.
The codebase must be:
- idiomatic Rust;
- async-ready;
- modular;
- library-first;
- CLI as a thin layer over the core;
- well testable;
- feature-flag driven;
- backed by stable report schemas;
- equipped with professional error handling;
- easily extensible;
- CI-friendly;
- suitable for public open-source release.
Use clear domain-driven modules. Avoid monolithic files.
The project is a Rust workspace with at least these crates:
cu-profiler/
├── crates/
│ ├── cu-profiler-core/
│ ├── cu-profiler-cli/
│ ├── cu-profiler-instrumentation/
│ └── cu-profiler-report/
├── examples/
├── tests/
├── docs/
├── .github/
└── Cargo.toml
Contains all core logic:
- scenario model;
- profiler engine;
- simulation abstraction;
- log parser;
- CPI parser;
- scope attribution;
- budget policy engine;
- baseline comparison;
- confidence scoring;
- diagnostic engine;
- core data types;
- error types.
This crate must not depend on CLI-specific code.
CLI interface only. Uses clap.
Commands:
cu-profiler init
cu-profiler run
cu-profiler compare
cu-profiler baseline save
cu-profiler baseline approve
cu-profiler ci
cu-profiler explain
cu-profiler inspect
The CLI must use library calls from cu-profiler-core.
Lightweight helpers/macros for Solana programs:
- mark scopes;
- log begin/end markers;
- optionally log compute snapshots;
- feature-gated instrumentation.
Instrumentation must be optional and clearly incur overhead. The profiler must be able to report that overhead.
Output formats:
- table;
- JSON;
- Markdown;
- JUnit XML;
- HTML (later).
This crate must keep raw data and rendering separated.
Use Solana local simulation as the primary v1 backend.
The core abstracts over execution backends:
ExecutionBackend
├── ProgramTestBackend
├── BanksClientBackend
├── RpcSimulationBackend (later)
└── RecordedLogsBackend (for tests)
For v1:
- implement
ProgramTestBackend; - implement
BanksClientBackendas a wrapper/abstraction; - design interfaces for
RpcSimulationBackendup front, but it need not fully work yet.
Use solana-program-test for local simulation. The tool must be able to analyze
transaction simulation results and logs.
Measure at minimum:
- total compute units per scenario;
- compute units per transaction;
- compute units per instruction;
- compute units per program invocation;
- CPI count;
- CPI depth;
- Compute Budget requested limit;
- actual consumed;
- budget margin;
- over-requested compute;
- failed simulation path;
- logs;
- parsing warnings;
- confidence score.
Scope-level attribution may only be done via explicit markers or reliable log structures. Make no false claims about automatic source-line profiling.
Introduce a first-class Scenario. A scenario is not just a test — it is a
reproducible compute benchmark.
A scenario conceptually contains:
Scenario
- name
- description
- tags
- criticality
- owner
- instruction builder
- account fixtures
- expected result
- budget policy
- sample count
- metadata
Supported scenarios such as:
swap/happy_path
swap/large_pool
swap/referral_enabled
swap/token_2022
swap/failure_invalid_owner
initialize_pool/minimal
initialize_pool/full_config
liquidate/max_position
liquidate/stale_oracle_failure
Also measure failure paths. A failed instruction that consumes a lot of CU is relevant for both performance and security.
Budget policies are first-class. Support at minimum:
absolute max CU
warning threshold
max regression percentage
max regression units
minimum margin percentage
max CPI count
max CPI depth
max unattributed CU percentage
max instrumentation overhead warning
Each policy produces a structured result:
PolicyResult
- policy_id
- status: pass | warn | fail
- severity
- actual
- expected
- message
- remediation hint
CI must be able to fail on:
- absolute budget exceeded;
- regression threshold exceeded;
- stale baseline;
- scenario failed;
- low confidence (when strict mode is on).
Baselines are essential. Implement:
cu-profiler baseline save
cu-profiler compare
cu-profiler baseline approve
A baseline stores not only CU but also fingerprint metadata. A baseline record contains:
scenario name
actual units
budget
timestamp
git commit (if available)
program binary hash
scenario hash
fixture hash
config hash
solana crate versions (if available)
profiler version
instrumentation mode
confidence score
When fingerprints do not match, the tool must be able to say:
Baseline is stale because fixture hash changed.
Comparison confidence: Low.
A robust parser for Solana logs. It must recognize at minimum:
Program <id> invoke [depth]
Program <id> consumed X of Y compute units
Program <id> success
Program <id> failed
ComputeBudget instructions
custom CU_PROFILER_BEGIN markers
custom CU_PROFILER_END markers
custom CU_PROFILER_POINT markers
It must build a call tree:
root transaction
└── user_program::swap
├── spl_token::transfer
├── associated_token_account::create
│ ├── system_program::create_account
│ └── spl_token::initialize_account
└── oracle_program::read_price
Each node contains:
program id
program label (if known)
invoke depth
units consumed (if available)
children
logs
status
The parser must be tolerant:
- incomplete logs must not panic;
- preserve unknown lines as raw logs;
- collect parser warnings;
- lower confidence on inconsistencies.
A registry for known program IDs. Support at minimum:
System Program
Compute Budget Program
SPL Token
Token-2022
Associated Token Account Program
Memo Program
The registry is extensible via config. For unknown programs:
Unknown Program <pubkey>
Explicit profiler markers, conceptual log form:
CU_PROFILER_BEGIN name=swap::validate_accounts
CU_PROFILER_POINT name=after_validation
CU_PROFILER_END name=swap::validate_accounts
The parser links scopes to logs and compute deltas where possible. Each scope result:
ScopeResult
- name
- parent
- units_estimated
- percentage_of_total
- attribution_method
- confidence
- warnings
Rules:
- balanced markers give higher confidence;
- unbalanced markers produce a parser warning;
- many markers raise the instrumentation overhead warning;
- scope-level CU is an estimate unless derived directly from reliable logs.
Every measurement carries a confidence score:
High | Medium | Low | Unknown
Factors:
simulation success
logs complete
parser consistency
baseline fingerprint match
sample variance
instrumentation overhead
unattributed CU percentage
scope marker quality
runtime/version metadata availability
Sample variance folds in when a scenario is multi-sampled (samples > 1) on a
non-deterministic backend: a high coefficient of variation across the runs demotes
confidence (≥2% → Medium, ≥10% → Low) with a reason. The deterministic recorded
backend ignores samples, so it never reports variance it did not observe.
Always report why confidence is not High. Example:
Confidence: Medium
Reasons:
- 22% unattributed CU
- 14 scope markers detected
- baseline matched
- logs parsed successfully
Detects anti-patterns. At minimum:
near budget limit
absolute budget exceeded
regression exceeded
expensive failure path
late validation
CPI explosion
high CPI share
high CPI depth
event/log bloat
over-requested compute budget
high unattributed CU
stale baseline
low confidence measurement
Each diagnostic:
Diagnostic
- id
- title
- severity
- scenario
- evidence
- recommendation
Recommendations must be Solana-specific, not generic. Examples:
Move cheap validation before CPI.
Reduce event emission in hot path.
Add scope markers around account validation and math.
Check duplicate ATA creation.
Consider zero-copy for large account deserialization.
Lower requested compute limit if consistently over-requested.
Keep raw data and rendering separated. Support multiple outputs.
Scenario Actual CU Budget Delta Status
swap_exact_in 96,812 100,000 +6.1% WARN
initialize_pool 78,902 80,000 +10.6% FAIL
close_position 38,991 45,000 -2.1% PASS
Stable schema via serde. Must contain all details:
summary
scenarios
budgets
baseline_comparison
call_trees
scopes
diagnostics
confidence
metadata
For GitHub PR comments. cu-profiler run/ci --format markdown renders the report as
Markdown; cu-profiler comment (see §15) delivers it as a sticky PR comment —
one comment per pull request, created once and updated in place on every later run,
identified by a hidden <!-- cu-profiler-report --> marker. (The "GitHub PR comments"
capability listed under §35 is delivered here.)
For CI test dashboards. A scenario that fails budget must map to a failed test case.
Keep the report layer extensible so HTML/flamegraphs can be added later.
The CLI must feel professional.
Generates:
cu-profiler.toml
.cu/baseline.json (if desired)
.github/workflows/cu-profiler.yml (if flag)
examples/scenarios.rs
bench.toml (starter plan for the turnkey real-CU path)
Runs scenarios and shows the report. Flags:
--config
--format table|json|markdown|junit
--output
--scenario
--tag
--samples
--strict
--fail-on-budget
--fail-on-regression
--fail-on-low-confidence
Compares the current run against the baseline.
Optimized CI mode:
- deterministic output;
- clear exit codes;
- JSON/Markdown artifacts;
- no interactive prompts.
Posts the Markdown report as a sticky pull-request comment (§14.3). Flags:
--input post a pre-rendered report.md instead of re-rendering from config
--pr PR number (defaults to the GitHub Actions event, then refs/pull/<n>/merge)
--repo owner/repo (defaults to $GITHUB_REPOSITORY)
--marker hidden marker identifying the sticky comment (default: cu-profiler-report)
--dry-run render and print the comment body without contacting GitHub
Auth is the $GITHUB_TOKEN env var (never a flag); the workflow needs
permissions: pull-requests: write. On a non-PR build (e.g. push) the command
no-ops. Requires the remote feature (on by default), like import --signature.
cu-profiler init --workflow scaffolds a workflow that renders and posts the comment.
Diagnoses a single scenario.
Reads an existing report and shows analysis without re-simulating.
Turnkey real-CU path. Reads a declarative bench plan (bench.toml), validates it,
optionally builds the program, and — with --program-name — measures real compute
units by delegating to the Linux cu-profiler-bench executor (see below). Flags:
--fixtures bench plan file (default bench.toml)
--program-name .so stem (loaded from $SBF_OUT_DIR); with it, bench measures
--build run `cargo build-sbf` first
--manifest-path directory to build in (default .)
A bench.toml declares the instruction(s) to execute as data:
[[instruction]]
scenario = "swap_exact_in"
program_id = "SwapPRogram1111111111111111111111111111"
data = "01ab" # hex instruction data
[[instruction.account]]
pubkey = "11111111111111111111111111111111"
signer = true
writable = true
lamports = 1000000Real measurement is performed by cu-profiler-bench, a Linux-only executor built
from the cu-profiler-mollusk crate (it links the Solana/Mollusk stack, which is not
buildable on every host). cu-profiler bench --program-name <p> runs it over PATH —
a runtime sibling, not a build dependency, so the main CLI stays Solana-free. Without
the executor installed, bench validates the plan and fails with the exact command to
run; without --program-name, it validates the plan only.
Stable and documented:
0 = success
1 = budget or regression failure
2 = configuration error
3 = simulation failure
4 = stale or missing baseline
5 = parser/report error
6 = low confidence in strict mode
Uses cu-profiler.toml. Parse strictly, but give clear error messages.
[project]
name = "my-solana-program"
program_id = "..."
mode = "program-test"
[defaults]
warn_at_budget_pct = 90
max_regression_pct = 5
fail_on_budget = true
fail_on_regression = true
fail_on_stale_baseline = false
[output]
default_format = "table"
json_path = "target/cu-profiler/report.json"
markdown_path = "target/cu-profiler/report.md"
junit_path = "target/cu-profiler/junit.xml"
[program_labels]
"11111111111111111111111111111111" = "System Program"
[scenario.swap_exact_in]
budget = 100000
warn_at_budget_pct = 90
max_regression_pct = 5
critical = true
tags = ["swap", "hot-path", "user-facing"]
[scenario.initialize_pool]
budget = 80000
max_regression_pct = 3
critical = true
tags = ["admin", "setup"]Unit tests for: log parsing, call tree reconstruction, budget policies, baseline comparison, confidence scoring, report serialization, config parsing, diagnostics.
Golden tests with fixture logs and expected JSON output:
tests/fixtures/logs/spl_token_transfer.log
tests/fixtures/expected/spl_token_transfer.report.json
Integration tests with solana-program-test — at least one small test program
or mock scenario.
Snapshot tests for table/markdown output.
Fuzz/property tests (later) for parser robustness.
Use thiserror for typed errors. No unwrap() or expect() in library code
(except in tests). Errors must be useful.
Bad:
Parse failed
Good:
Failed to parse compute unit line at log index 42:
"Program xyz consumed abc compute units"
Reason: expected integer after "consumed"
Use tracing. CLI flags:
-v
-vv
--quiet
--trace
The core may emit tracing events but must not pollute stdout.
The profiler itself must be efficient:
- the parser must run linearly over logs;
- avoid unnecessary clones;
- use
Arc<str>or owned strings where sensible; - report structs must be serializable;
- large logs must not be duplicated everywhere;
- raw logs optional in JSON via config.
default = ["json", "table"]
json
table
markdown
junit
program-test
anchor
instrumentation
ci
html (later)
Crates must remain compilable with minimal features where logical.
Optional, via feature flag. For v1 it need not be complete, but design for:
IDL parsing
instruction name mapping
account name mapping
Anchor event parsing
Anchor error mapping
Anchor constraint overhead hints
- native Solana must remain first-class;
- Anchor must not be a hard dependency in the core default.
Maintain docs from the start. At minimum:
README.md
docs/architecture.md
docs/scenarios.md
docs/baselines.md
docs/ci.md
docs/instrumentation.md
docs/report-schema.md
docs/config.md
README must contain: what it does, quickstart, example output, CI example, limitations, confidence explanation.
Be explicit about limitations:
Function-level attribution requires explicit markers.
Program-test results may differ from mainnet runtime conditions.
Instrumentation adds overhead.
Baselines are only valid when fingerprints match.
name: CU Profiler
on:
pull_request:
push:
branches: [main]
jobs:
cu-profiler:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write # for the sticky PR comment
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- run: cargo build --workspace
- run: cargo test --workspace
- run: cargo run -p cu-profiler -- ci --config cu-profiler.toml --format markdown --output target/cu-profiler/report.md
- if: ${{ always() && github.event_name == 'pull_request' }}
run: cargo run -p cu-profiler -- comment --input target/cu-profiler/report.md
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- uses: actions/upload-artifact@v4
if: ${{ always() }}
with:
name: cu-profiler-report
path: target/cu-profiler/Treat audit diagnostics seriously. Detect:
expensive failure path
validation after CPI
near-limit critical instruction
high writable account count (later)
high CPI depth
stale baseline
missing critical scenario
Leave room for later:
compute-aware fuzzing
worst-case CU search
mainnet transaction log import
account snapshot realism
perfect source-line profiling
automatic Rust AST attribution
remote mainnet observability dashboard
hosted SaaS
AI-generated unsafe optimization patches
full Anchor internals
custom Solana runtime fork
Focus on reliability, good architecture, and strong data.
A working first version with:
workspace setup
core crate
CLI crate
report crate
instrumentation crate
config parser
scenario model
program-test backend skeleton
log parser
CPI call tree parser
scope marker parser
budget engine
baseline engine
confidence scoring
diagnostic engine
table output
JSON output
Markdown output
JUnit output skeleton
tests
fixtures
docs
GitHub Actions example
For anything not yet fully implemented, use clear traits, structs, TODOs, and typed errors. Avoid half-baked code.
library-first
CLI-thin-wrapper
typed domain model
structured reports
stable schema
no fake precision
explicit confidence
feature-gated integrations
deterministic CI behavior
clear errors
composable traits
minimal global state
no hidden side effects
crates/cu-profiler-core/src/
├── lib.rs
├── error.rs
├── scenario.rs
├── profiler.rs
├── backend/
│ ├── mod.rs
│ ├── program_test.rs
│ ├── banks_client.rs
│ └── recorded.rs
├── parser/
│ ├── mod.rs
│ ├── solana_logs.rs
│ ├── compute_budget.rs
│ ├── cpi_tree.rs
│ └── scope_markers.rs
├── budget/
│ ├── mod.rs
│ ├── policy.rs
│ └── result.rs
├── baseline/
│ ├── mod.rs
│ ├── fingerprint.rs
│ └── compare.rs
├── diagnostics/
│ ├── mod.rs
│ └── rules.rs
├── confidence.rs
├── program_registry.rs
└── metadata.rs
crates/cu-profiler-report/src/
├── lib.rs
├── model.rs
├── table.rs
├── json.rs
├── markdown.rs
└── junit.rs
crates/cu-profiler-cli/src/
├── main.rs
├── args.rs
├── commands/
│ ├── mod.rs
│ ├── init.rs
│ ├── run.rs
│ ├── compare.rs
│ ├── baseline.rs
│ ├── ci.rs
│ ├── explain.rs
│ └── inspect.rs
└── exit.rs
crates/cu-profiler-instrumentation/src/
├── lib.rs
├── markers.rs
└── macros.rs
Required:
cargo fmt
cargo clippy --workspace --all-targets --all-features
cargo test --workspace --all-features
no unwraps in library code
clear module boundaries
public API documented
examples compile
JSON schema stable
fixtures included
Use where useful:
serde
serde_json
serde_with
thiserror
anyhow (only in CLI or boundary layers)
clap
toml
tracing
prettytable-rs or comfy-table
quick-xml (for JUnit if needed)
insta (for snapshots if appropriate)
Pin one compatible Solana/Agave crate family deliberately and prevent dependency mismatch.
Order of work:
- Create workspace and crates.
- Design core domain types.
- Design report model.
- Build config parser.
- Build recorded-logs backend for parser tests.
- Build Solana log parser.
- Build CPI call tree.
- Build scope marker parser.
- Build budget policy engine.
- Build baseline comparison.
- Build confidence scoring.
- Build diagnostics.
- Build table/JSON/Markdown output.
- Wire CLI commands to core.
- Build program-test backend skeleton.
- Add tests and fixtures.
- Add docs and GitHub Actions example.
Start with recorded log fixtures so parser, reports, and CI logic can be developed stably without immediately depending on complex Solana integration tests.
The first version is successful if a Solana team can:
1. run cu-profiler init
2. define scenarios
3. run cu-profiler run
4. see a table report
5. get a JSON export
6. save a baseline
7. compare a PR against the baseline
8. fail CI on regression
9. see which CPI/scope consumes the most CU
10. understand how reliable the measurement is
The tool must be honest about limitations from day one and make no unreliable claims.
Professional and exact. Avoid hype in errors/output. Use clear language:
PASS | WARN | FAIL | UNKNOWN
Give actionable messages:
Scenario `swap/referral_enabled` exceeded regression policy.
Baseline: 91,204 CU
Current: 96,812 CU
Delta: +6.15%
Allowed: +5.00%
Recommended action: inspect CPI count and referral account validation path.
Build cu-profiler as if it were a serious open-source infrastructure tool for
Solana teams, auditors, and CI pipelines.
Keep the code modular enough to later extend with:
Anchor IDL support
HTML reports
flamegraphs
RPC simulation
mainnet account snapshots
compute-aware fuzzing
GitHub PR comments
historical trends
hosted dashboards
Keep v1 reliable, clean, and professional.