Skip to content

persist: productionize A/B file persistence#3105

Merged
wen-coding merged 1 commit intomainfrom
wen/productionize_ab_file_persistence
Mar 25, 2026
Merged

persist: productionize A/B file persistence#3105
wen-coding merged 1 commit intomainfrom
wen/productionize_ab_file_persistence

Conversation

@wen-coding
Copy link
Contributor

@wen-coding wen-coding commented Mar 20, 2026

Summary

Productionize the A/B crash-safe persistence layer: add CRC32-C integrity
checking and replace the PersistedWrapper protobuf envelope with a compact
binary file header.

The original format wrapped the seq number and proto payload inside a
PersistedWrapper protobuf — no integrity checking, and double-marshalling
(proto inside proto). The new on-disk format is:

[4-byte CRC32-C BE] [8-byte seq LE] [raw proto bytes]

CRC and seq are validated before proto decoding, catching silent truncation
and bit-rot that proto.Unmarshal alone would miss.

Changes

Proto (autobahn.proto)

  • Remove PersistedWrapper message entirely. Seq is now a raw file-header
    field; CRC is new.

Persistence (persist.go)

  • Introduce dataWithSeq struct used consistently across the read and write
    paths: loadFile, loadPersisted, writeFile, and Persist all operate
    on this type.
  • writeFile: encodes seq, computes CRC, and writes all chunks internally —
    callers just pass a dataWithSeq. File is fsynced before return.
  • loadFile: validates CRC and seq from raw bytes, returns dataWithSeq.
  • loadPersisted: picks the higher-seq file directly from dataWithSeq
    values returned by loadFile.
  • Persist: marshals proto, builds dataWithSeq, delegates to writeFile.
    Seq is only advanced after successful write.
  • writeAndSync: simple fsync helper for WAL entries (blocks, commitqcs).
  • Remove exported test helpers (WriteRawFile, WriteTestFile,
    CorruptFileForTest). Nothing from the file format leaks to external
    packages.

Tests (persist_test.go)

  • Use wrapperspb.StringValue (standard protobuf) as the test message type
    instead of the deleted PersistedWrapper.
  • New file-format-level tests: TestLoadFileDataCorruption,
    TestLoadFileTruncatedHeader, TestLoadFileZeroSeq,
    TestLoadFileSeqCorruption, TestPersistFileFormat.
  • Seq rollback on write failure, winner preservation across restarts,
    OS-error propagation.

External test (avail/state_test.go)

  • Corrupt-file test discovers A/B filenames via a throwaway NewPersister
    • os.ReadDir, avoiding any dependency on internal file naming.

Test plan

  • go test ./internal/autobahn/consensus/persist/ — all pass
  • go test ./internal/autobahn/avail/ — all pass
  • golangci-lint run — 0 issues

@github-actions
Copy link

github-actions bot commented Mar 20, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMar 25, 2026, 6:28 PM

@github-actions
Copy link

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMar 20, 2026, 11:33 PM

@codecov
Copy link

codecov bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 88.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.61%. Comparing base (d6f6b9c) to head (6512a0e).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...int/internal/autobahn/consensus/persist/persist.go 88.00% 5 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3105   +/-   ##
=======================================
  Coverage   58.61%   58.61%           
=======================================
  Files        2099     2099           
  Lines      173514   173525   +11     
=======================================
+ Hits       101702   101718   +16     
+ Misses      62761    62758    -3     
+ Partials     9051     9049    -2     
Flag Coverage Δ
sei-chain-pr 77.89% <88.00%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...int/internal/autobahn/consensus/persist/persist.go 83.33% <88.00%> (+3.16%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wen-coding wen-coding requested a review from pompon0 March 24, 2026 15:26
@wen-coding wen-coding changed the title persist: add CRC32-C integrity check and seq validation to A/B files persist: add CRC32-C integrity check, replace PersistedWrapper with binary file header Mar 24, 2026
@wen-coding wen-coding force-pushed the wen/productionize_ab_file_persistence branch from 2ef4d18 to beca4cd Compare March 24, 2026 19:08
// so the validator can restart. Returns ErrNoData when no persisted files exist (use errors.Is).
// Returns other error only when both files fail to load or state is inconsistent (same seq).
func loadPersisted(dir string, prefix string) (*pb.PersistedWrapper, error) {
func loadPersisted(dir string, prefix string) (uint64, []byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps it still deserves a (regular non-proto) struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added dataWithSeq for this.

if err := proto.Unmarshal(bz, &wrapper); err != nil {
return nil, fmt.Errorf("unmarshal %s: %w", filename, fmt.Errorf("%v: %w", err, ErrCorrupt))

wantCRC := binary.BigEndian.Uint32(bz[:crcSize])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: crc computation seems to belong rather in writeFile imo (which would simplify the function signature - currently is accepts crc32 which it assumes to be computed correctly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, done

// writeFile writes an A/B state file: [4-byte CRC32-C BE][8-byte seq LE][proto data].
// Writes CRC, seq, and data directly to avoid copying data into an intermediate buffer.
// Used by abPersister.Persist for crash-safe state updates.
func writeFile(path string, crc uint32, seq, data []byte) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: now that writeAndSync was renamed to writeFile, it would be worth to document that it still does fsync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wen-coding wen-coding force-pushed the wen/productionize_ab_file_persistence branch from dec8b7b to 566bc1f Compare March 25, 2026 18:23
@wen-coding wen-coding changed the title persist: add CRC32-C integrity check, replace PersistedWrapper with binary file header persist: productionize A/B file persistence Mar 25, 2026
@wen-coding wen-coding enabled auto-merge March 25, 2026 18:24
@wen-coding wen-coding disabled auto-merge March 25, 2026 18:24
- Move CRC32-C checksum and sequence number from PersistedWrapper proto
  to a binary file header: [4-byte CRC BE][8-byte seq LE][proto bytes]
- Remove PersistedWrapper proto message entirely
- Introduce dataWithSeq struct used consistently by loadFile,
  loadPersisted, writeFile, and Persist
- Move CRC computation and seq encoding into writeFile so callers just
  pass a dataWithSeq without dealing with serialization details
- Add comprehensive tests: CRC validation, truncated header detection,
  zero-seq rejection, seq corruption, OS-error propagation, seq
  rollback on write failure, and winner-preservation across restarts

Made-with: Cursor
@wen-coding wen-coding force-pushed the wen/productionize_ab_file_persistence branch from 566bc1f to 6512a0e Compare March 25, 2026 18:27
@wen-coding wen-coding enabled auto-merge March 25, 2026 18:29
@wen-coding wen-coding added this pull request to the merge queue Mar 25, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 25, 2026
@wen-coding wen-coding added this pull request to the merge queue Mar 25, 2026
Merged via the queue into main with commit 78c53dd Mar 25, 2026
39 checks passed
@wen-coding wen-coding deleted the wen/productionize_ab_file_persistence branch March 25, 2026 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants