persist: productionize A/B file persistence#3105
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3105 +/- ##
=======================================
Coverage 58.61% 58.61%
=======================================
Files 2099 2099
Lines 173514 173525 +11
=======================================
+ Hits 101702 101718 +16
+ Misses 62761 62758 -3
+ Partials 9051 9049 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
2ef4d18 to
beca4cd
Compare
| // so the validator can restart. Returns ErrNoData when no persisted files exist (use errors.Is). | ||
| // Returns other error only when both files fail to load or state is inconsistent (same seq). | ||
| func loadPersisted(dir string, prefix string) (*pb.PersistedWrapper, error) { | ||
| func loadPersisted(dir string, prefix string) (uint64, []byte, error) { |
There was a problem hiding this comment.
nit: perhaps it still deserves a (regular non-proto) struct?
There was a problem hiding this comment.
Added dataWithSeq for this.
| if err := proto.Unmarshal(bz, &wrapper); err != nil { | ||
| return nil, fmt.Errorf("unmarshal %s: %w", filename, fmt.Errorf("%v: %w", err, ErrCorrupt)) | ||
|
|
||
| wantCRC := binary.BigEndian.Uint32(bz[:crcSize]) |
There was a problem hiding this comment.
nit: crc computation seems to belong rather in writeFile imo (which would simplify the function signature - currently is accepts crc32 which it assumes to be computed correctly)
| // writeFile writes an A/B state file: [4-byte CRC32-C BE][8-byte seq LE][proto data]. | ||
| // Writes CRC, seq, and data directly to avoid copying data into an intermediate buffer. | ||
| // Used by abPersister.Persist for crash-safe state updates. | ||
| func writeFile(path string, crc uint32, seq, data []byte) error { |
There was a problem hiding this comment.
nit: now that writeAndSync was renamed to writeFile, it would be worth to document that it still does fsync.
dec8b7b to
566bc1f
Compare
- Move CRC32-C checksum and sequence number from PersistedWrapper proto to a binary file header: [4-byte CRC BE][8-byte seq LE][proto bytes] - Remove PersistedWrapper proto message entirely - Introduce dataWithSeq struct used consistently by loadFile, loadPersisted, writeFile, and Persist - Move CRC computation and seq encoding into writeFile so callers just pass a dataWithSeq without dealing with serialization details - Add comprehensive tests: CRC validation, truncated header detection, zero-seq rejection, seq corruption, OS-error propagation, seq rollback on write failure, and winner-preservation across restarts Made-with: Cursor
566bc1f to
6512a0e
Compare
Summary
Productionize the A/B crash-safe persistence layer: add CRC32-C integrity
checking and replace the
PersistedWrapperprotobuf envelope with a compactbinary file header.
The original format wrapped the seq number and proto payload inside a
PersistedWrapperprotobuf — no integrity checking, and double-marshalling(proto inside proto). The new on-disk format is:
CRC and seq are validated before proto decoding, catching silent truncation
and bit-rot that
proto.Unmarshalalone would miss.Changes
Proto (
autobahn.proto)PersistedWrappermessage entirely. Seq is now a raw file-headerfield; CRC is new.
Persistence (
persist.go)dataWithSeqstruct used consistently across the read and writepaths:
loadFile,loadPersisted,writeFile, andPersistall operateon this type.
writeFile: encodes seq, computes CRC, and writes all chunks internally —callers just pass a
dataWithSeq. File is fsynced before return.loadFile: validates CRC and seq from raw bytes, returnsdataWithSeq.loadPersisted: picks the higher-seq file directly fromdataWithSeqvalues returned by
loadFile.Persist: marshals proto, buildsdataWithSeq, delegates towriteFile.Seq is only advanced after successful write.
writeAndSync: simple fsync helper for WAL entries (blocks, commitqcs).WriteRawFile,WriteTestFile,CorruptFileForTest). Nothing from the file format leaks to externalpackages.
Tests (
persist_test.go)wrapperspb.StringValue(standard protobuf) as the test message typeinstead of the deleted
PersistedWrapper.TestLoadFileDataCorruption,TestLoadFileTruncatedHeader,TestLoadFileZeroSeq,TestLoadFileSeqCorruption,TestPersistFileFormat.OS-error propagation.
External test (
avail/state_test.go)NewPersisteros.ReadDir, avoiding any dependency on internal file naming.Test plan
go test ./internal/autobahn/consensus/persist/— all passgo test ./internal/autobahn/avail/— all passgolangci-lint run— 0 issues