Skip to content

Parquet crash testing unit testing hooks#3028

Open
jewei1997 wants to merge 9 commits intomainfrom
STO-378/parquet-crash-testing
Open

Parquet crash testing unit testing hooks#3028
jewei1997 wants to merge 9 commits intomainfrom
STO-378/parquet-crash-testing

Conversation

@jewei1997
Copy link
Contributor

@jewei1997 jewei1997 commented Mar 5, 2026

Describe your changes and provide context

This PR adds test-only fault-injection hooks to the parquet receipt store so we can simulate crashes at specific points in the write pipeline and validate recovery behavior. The hooks cover the key stages of persistence: after WAL write, before parquet flush, after parquet flush, after closing writers during file rotation, and after WAL clear during rotation.

It also adds a SimulateCrash() helper that intentionally abandons the store without the normal flush/finalization path, which lets the tests mimic abrupt process termination and then reopen the same store directory to verify recovery.

On top of that, this PR adds parquet receipt crash-recovery coverage that:

verifies recovery at each hook point, including file-rotation scenarios
runs randomized multi-crash stress tests to ensure WAL-committed blocks remain readable after reopen
verifies concurrent readers can still read committed receipts and logs while writes are artificially slowed
The goal is to increase confidence in parquet receipt durability and crash recovery behavior without changing normal production behavior outside of tests.

Testing performed to validate your change

go test ./sei-db/ledger_db/receipt -run 'TestCrashRecoveryAtEachHookPoint|TestCrashRecoveryStress|TestSlowFlushWithConcurrentReads' -count=1

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMar 6, 2026, 1:07 PM

@codecov
Copy link

codecov bot commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.29%. Comparing base (b866a23) to head (413f99e).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3028      +/-   ##
==========================================
+ Coverage   58.27%   58.29%   +0.01%     
==========================================
  Files        2077     2077              
  Lines      171308   171338      +30     
==========================================
+ Hits        99828    99874      +46     
+ Misses      62583    62573      -10     
+ Partials     8897     8891       -6     
Flag Coverage Δ
sei-chain-pr 74.24% <100.00%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/ledger_db/parquet/store.go 69.66% <100.00%> (+4.98%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jewei1997 jewei1997 marked this pull request as ready for review March 6, 2026 12:54
// file descriptors and locks so the test process can reopen the same directory.
func (s *Store) SimulateCrash() {
if s.pruneStop != nil {
close(s.pruneStop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set pruneStop to nil after close? otherwise Close() will do double close on a closed channel

// be recoverable via WAL replay.
func TestCrashRecoveryStress(t *testing.T) {
seed := int64(42)
t.Logf("random seed: %d (change to reproduce a specific run)", seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks the seed is always 42 not random?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants