Test framework: avoid stale tmpdir reuse on PID collision#3969
Test framework: avoid stale tmpdir reuse on PID collision#3969zuiderkwast wants to merge 1 commit into
Conversation
If the OS reuses a PID, the tmpdir name can collide with a leftover directory from a previous test run, causing servers to load stale config (e.g. nodes.conf). Fix by skipping existing directories in tmpdir allocation. Also clean nodes.conf in clean_persistence as an additional, which was missing when removing files after a start_server block. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughTwo test infrastructure functions are updated: persistence cleanup now removes cluster configuration files, and temporary directory generation adds collision detection to avoid reusing existing paths. ChangesTest Support Infrastructure
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
hpatro
left a comment
There was a problem hiding this comment.
Seems no harm in looping through to find a non existing directory. But how did you determine if the PID was reused? I see the PID: 36285 getting printed out once.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3969 +/- ##
============================================
+ Coverage 76.67% 76.70% +0.02%
============================================
Files 162 162
Lines 80752 80752
============================================
+ Hits 61917 61941 +24
+ Misses 18835 18811 -24 🚀 New features to boost your workflow:
|
@hpatro Actually, no, I can't explain it. The PID is the test framework's client process (as in It could maybe happen if there are multiple I'm no longer so sure about merging this PR. I'll leave it open for some time to see if we ever see this kind of failure again... |
If the OS reuses a PID, the tmpdir name can collide with a leftover directory from a previous test run, causing servers to load stale config (e.g. nodes.conf). Fix by skipping existing directories in tmpdir allocation.
Also remove nodes.conf in clean_persistence, which was missing when removing files after a start_server block where other files were removed. Cleaning doesn't affect runs with
--dont-cleanthough, so both fixes are useful.This problem was seen in this CI job, where a fresh server loaded a stale nodes.conf which confused the cluster and caused the test to fail: https://github.com/valkey-io/valkey/actions/runs/27279021315/job/80567998236?pr=3961#step:6:7995