Stop passing column lists on the command line. Freeze your domain knowledge into a versioned, validated, tamper-evident config.
brew install cmdrvl/tap/profileYour dataset has 42 columns. 15 of them matter for this analysis. The key is loan_id. Float precision is 6 decimal places. Order doesn't matter. Where does this knowledge live? In a Slack thread? In someone's head? In a --exclude flag you'll forget next month?
profile captures all of it in a versioned YAML file that downstream report tools can consume. Draft one from a real CSV header, iterate until it's right, then freeze it — immutable, SHA-256 hashed, recorded in every lockfile and report that uses it. Change a column? New version. Full audit trail.
- Draft → freeze lifecycle —
profile draft initreads a CSV header and generates a starting profile. Edit it. Lint it against real data. When it's right,profile freezemakes it immutable and content-addressed. - Key intelligence —
profile suggest-keyranks candidate key columns by uniqueness, null rate, and type. No guessing. - One file, reusable scoping —
rvlconsumes frozen profiles today, and the same artifact is the intended scoping surface forshape,compare, andlockas those integrations settle. Declare your domain choices once. - Schema drift detection —
profile lint --against data.csvcatches columns that disappeared, keys that aren't unique, and types that shifted.
# 1) Create a draft profile from a real dataset
profile draft init loan_tape.csv --out loan_tape.draft.yaml
# 2) Validate schema and lint against the dataset
profile validate loan_tape.draft.yaml
profile lint loan_tape.draft.yaml --against loan_tape.csv
# 3) Freeze to an immutable profile
profile freeze loan_tape.draft.yaml \
--family csv.loan_tape.core \
--version 0 \
--out profiles/csv.loan_tape.core.v0.yaml
# 4) Use the frozen profile with the current live downstream surface
rvl old.csv new.csv --profile profiles/csv.loan_tape.core.v0.yaml --jsonrvl is profile-aware today. Check shape / compare operator contracts before assuming equivalent --profile behavior in the current release line.
profile draft init loan_tape.csv --out loan_tape.draft.yaml
# writes: loan_tape.draft.yaml
profile lint loan_tape.draft.yaml --against loan_tape.csv
# exit 0 (or exit 1 with deterministic lint issues)
profile freeze loan_tape.draft.yaml \
--family csv.loan_tape.core \
--version 0 \
--out profiles/csv.loan_tape.core.v0.yaml
# writes: profiles/csv.loan_tape.core.v0.yaml
# frozen profile includes: profile_id, profile_family, profile_version, profile_sha256A draft is cheap to iterate. A frozen profile is immutable and hashable for reproducible downstream analysis.
profile is a metadata tool that configures how report tools operate.
vacuum → hashbytes → lock
↑
profile → rvl
profile → shape / compare (planned or release-line dependent)
Profile doesn't sit in the stream pipeline (vacuum → hashbytes → lock). Instead, it produces configuration files that downstream tools can consume where the current runtime contract supports --profile. rvl is the live consumer today; other integrations are converging by tool/release line. Lock records which profiles were active in its profiles array.
| If you need... | Use |
|---|---|
| Enumerate files in a directory | vacuum |
| Compute content hashes | hash |
| Match files against templates | fingerprint |
| Pin artifacts into a lockfile | lock |
| Check structural comparability | shape |
| Explain numeric changes | rvl |
profile only answers: which columns matter, what's the key, and how should values be compared?
A profile is a YAML file with a defined schema:
profile_id: "csv.loan_tape.core.v0"
profile_version: 0
include_columns:
- loan_id
- current_balance
- note_rate
- maturity_date
- property_type
- occupancy
key: ["loan_id"]
equivalence:
order: "order-invariant"
float_decimals: 6
trim_strings: true| Field | Type | Description |
|---|---|---|
profile_id |
string | Unique identifier with version suffix |
profile_version |
integer | Monotonically increasing version number |
include_columns |
string[] | Columns to include in analysis (others ignored) |
key |
string[] | Column(s) used for row alignment/joining |
equivalence.order |
string | "order-invariant" or "order-sensitive" |
equivalence.float_decimals |
integer | Decimal places for float comparison |
equivalence.trim_strings |
boolean | Trim whitespace before string comparison |
Once frozen, a profile is immutable:
profile_id: "csv.loan_tape.core.v0"
profile_version: 0
profile_sha256: "sha256:a1b2c3d4e5f6..."
frozen: true
# ... rest of profileAny semantic change requires a new profile_version and a new profile_id.
Generate a draft profile from a CSV header:
profile draft init loan_tape.csv --out loan_profile.yamlAuto-populates include_columns from the header. You edit the draft to remove unwanted columns and set the key.
Rank candidate key columns by uniqueness, null rate, and deterministic order:
profile suggest-key loan_tape.csv
# loan_id: unique=100%, nulls=0%, type=string ← recommended
# property_id: unique=85%, nulls=0%, type=stringValidate a profile against a dataset:
profile lint loan_profile.yaml --against loan_tape.csvCatches: missing columns, non-unique keys, type mismatches, schema drift.
Surface structural statistics about a dataset:
profile stats loan_tape.csv
# rows: 1,247 | columns: 42 | nulls: 3.2% | key candidates: loan_id, property_idValidate and mark a profile immutable with SHA-256 content hash:
profile freeze loan_profile.yaml \
--family csv.loan_tape.core \
--version 0 \
--out profiles/csv.loan_tape.core.v0.yaml| Capability | profile | Manual column lists | Config files | SQL views |
|---|---|---|---|---|
| Versioned and frozen | Yes | No | No | No |
| Content hash (tamper-evident) | Yes | No | No | No |
| Validated against dataset | Yes (lint) |
No | No | At query time |
| Key declaration | Yes | Ad-hoc | Ad-hoc | Yes |
| Normalization rules | Yes | No | No | No |
| Cross-tool (shape/rvl/compare) | Yes | No | No | No |
| Draft from header | Yes (draft init) |
Manual | Manual | Manual |
brew install cmdrvl/tap/profilecurl -fsSL https://raw.githubusercontent.com/cmdrvl/profile/main/scripts/install.sh | bashcargo build --release
./target/release/profile --helpCurrent runtime support:
rvlaccepts--profiletoday for key derivation and column scopingshapeexposes--profile/--profile-idflags, but its current operator contract still marks the check-scoping behavior as reserved/deferredcompareremains the deferred exhaustive diff tool in the broader spine roadmap
# rvl — only explain changes in profile columns
rvl old.csv new.csv --profile loan_profile.yaml --jsonlock records which profiles were active:
{
"profiles": [
{
"profile_id": "csv.loan_tape.core.v0",
"profile_version": 0,
"profile_sha256": "sha256:a1b2c3d4..."
}
]
}| Flag | Behavior |
|---|---|
--describe |
Print operator.json and exit 0 before normal input validation |
--schema |
Print profile JSON Schema and exit 0 before normal input validation (deferred in v0.1) |
--version |
Print profile <semver> and exit 0 |
--no-witness |
Suppress witness ledger recording |
| Exit | Meaning | When |
|---|---|---|
0 |
SUCCESS |
Operation completed with no issues |
1 |
ISSUES_FOUND |
Lint/diff found issues or differences |
2 |
REFUSAL |
Invalid input, schema violation, parse/IO refusal, or CLI error |
E_INVALID_SCHEMA, E_MISSING_FIELD, E_BAD_VERSION, E_ALREADY_FROZEN, E_IO, E_CSV_PARSE, E_EMPTY, E_COLUMN_NOT_FOUND
With --json, refusals are emitted in the unified output envelope (outcome=REFUSAL, refusal detail in result). Without --json, refusals are human-readable errors on stderr with the refusal code.
- Witness append is enabled for:
freeze,validate,lint,stats,suggest-key - Witness append is skipped for:
draft new,draft init,list,show,diff,push,pull --no-witnessdisables witness writes without changing domain outcome or exit semantics- Ledger path:
$EPISTEMIC_WITNESSor~/.epistemic/witness.jsonl - Witness append failures warn on stderr and do not change primary command outcome/exit code
A column in include_columns doesn't exist in the dataset. Run profile lint to diagnose:
profile lint loan_profile.yaml --against new_tape.csv
# ERROR: Column 'occupancy' in profile not found in datasetThe column(s) declared in key have duplicate values. Use profile suggest-key to find better candidates:
profile suggest-key loan_tape.csvFrozen profiles are immutable. If you need to change columns, create a new version:
# Old: csv.loan_tape.core.v0 (frozen)
# New: csv.loan_tape.core.v1 (add new columns, re-freeze)
profile_id: "csv.loan_tape.core.v1"
profile_version: 1If rvl reports spurious changes, your float_decimals may be too high. Try reducing precision:
equivalence:
float_decimals: 2 # was 6, reduced to match business precisionEnsure the profile file is committed and the path is correct. Profiles are plain YAML — no environment dependencies.
| Limitation | Detail |
|---|---|
| CSV only | v0 profiles scope CSV/TSV columns; XLSX sheet/range scoping is deferred |
| Single key type | Composite keys supported, but only column-based — no expression keys |
| No auto-update | Profile doesn't auto-detect schema changes — use lint to catch drift |
| No profile registry | Profiles are local files — centralized registry is deferred |
| Network publish deferred | push/pull data-fabric wrappers are deferred in v0.1 |
| Pre-release | Implementation in progress — spec is complete in the epistemic spine plan |
Flags don't compose. With 15 columns, a key, and normalization rules, the command line becomes unmanageable. A profile captures all scoping decisions in a versioned, validated, shareable file.
Immutability. Once a profile is frozen and referenced by a lockfile, you can prove that the exact same column scoping was used. Any change requires a new version, creating an audit trail.
Yes — as long as the datasets have the same schema. Use profile lint --against to verify compatibility before use.
They're ignored. Report tools only analyze columns in include_columns. This is the whole point — focus on what matters.
Fingerprint identifies what kind of file something is (template recognition). Profile declares which columns to analyze in report tools. They solve different problems and can be used together.
Without a key, report tools like rvl can't align rows between two datasets. The key column(s) define how rows map from old to new. profile suggest-key helps identify the best candidate.
Yes. profile draft init generates a starting profile from a CSV header. You can also write YAML directly or generate it from any tool.
For the full toolchain guide, see the Agent Operator Guide. Run profile --describe for this tool's machine-readable contract.
The profile specification is part of the epistemic spine plan. This README covers intended behavior; implementation is in progress.
cargo fmt --check
cargo clippy --all-targets -- -D warnings
cargo test