Recover from corrupt utmp records instead of aborting at the first#5115
Recover from corrupt utmp records instead of aborting at the first#5115kev365 wants to merge 7 commits into
Conversation
The utmp parser stopped at the first record it could not parse, abandoning every record after it. Because libc6 utmp/wtmp/btmp records are a fixed size, a single corrupt record can be skipped so the remaining records are still recovered: on a parse error, if a full record could still follow, emit a warning and seek to the next record boundary instead of stopping. Trailing data shorter than a record is still treated as the end of the file. Add a synthetic, self-authored test sample utmp_corrupted: a valid record, two consecutive unsupported-type records, a valid record, and a truncated trailing record -- exercising skip-and-continue recovery (including consecutive corruption) and the truncated-tail path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5115 +/- ##
==========================================
- Coverage 85.07% 85.07% -0.01%
==========================================
Files 455 455
Lines 40426 40464 +38
==========================================
+ Hits 34392 34424 +32
- Misses 6034 6040 +6 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| _INIT_PROCESS_TYPE = 5 | ||
| _DEAD_PROCESS_TYPE = 8 | ||
|
|
||
| # A libc6 utmp entry is a fixed-size record. |
There was a problem hiding this comment.
what about the 64-bit version of the record ? https://github.com/libyal/dtformats/blob/main/documentation/Utmp%20login%20records%20format.asciidoc
There was a problem hiding this comment.
Thanks Joachim. I've added support for the 64-bit (400-byte) libc6 utmp layout (e.g. aarch64, where ut_session and the ut_tv timeval are widened).
The parser selects the layout by validating the first record with each candidate: the 32-bit and 64-bit layouts share the fields before the session field, so the correct one is identified by a supported type and a valid subsecond field — the microseconds field is defined to be < 1,000,000, and a 64-bit record read with the 32-bit layout has its wider seconds field spill into it and is rejected. The record size is then derived from the selected layout, so the skip-and-recover logic no longer hard-codes 384. Verified against a sample generated on real aarch64 (test_data/utmp_64bit: boot, login and logout records).
The parser assumed the 384-byte 32-bit record. On 64-bit builds without 32-bit time compatibility (e.g. aarch64) the libc6 utmp record is 400 bytes: the session and timeval fields are widened. Reading such a file with the 32-bit layout produced wrong timestamps and misaligned every subsequent record. Add a linux_libc6_utmp_entry_64bit structure and select the layout by validating the first record. The two layouts share the fields before the session field, so the subsecond field distinguishes them: it is defined to be less than 1,000,000, and a 64-bit record read as 32-bit has its wider seconds field spill into it. The record size is derived from the selected layout instead of hard-coded, so the skip-and-recover logic advances by the correct amount for either layout. Add utmp_64bit, a small self-authored sample generated on aarch64 (boot, login and logout records). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for the addition, my comment was mostly on the regarding the comment in the code. I put something together to generate test data https://github.com/dfirlabs/utmp-specimens/ Recovering from a corrupt utmp is tricky given 384 or 400 0-byte could be a valid record. An approach to explore is to try a small number of records and abort if the data does not make sense Note that the utmp parser should also not interfere with other parsers in the sense of it claiming a file it is not the appropriate parser for. Also if I'm not mistaken the format is native-endian, which Plaso does not support at the moment. |
|
Maybe scan the the first 8 - 16 records and see:
|
This reverts commit 8199ca6. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On 64-bit platforms without 32-bit time compatibility (e.g. aarch64) the libc6 utmp record is 400 bytes rather than 384: the session and timeval fields are 64-bit. Reading such a file with the 32-bit layout produced wrong timestamps and misaligned every subsequent record. Add a linux_libc6_utmp_entry_64bit structure and select the record layout by reading the first records with each candidate and counting the valid non-empty records; a record read with the wrong record size misaligns and violates the record invariants (ut_type, microseconds range, and a terminal for non-empty records). The layout with the most valid non-empty records is used, and a file is only claimed when at least two are found, so the parser no longer over-claims files that are not utmp files. Add test_data/utmp_aarch64, generated on aarch64 with the dfirlabs/utmp-specimens generator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adopt the utmp type -> login_type rename from main's cleanup in the new 64-bit record support: the data type map, the record validation, and the tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The record-layout detection accepted a file when it found enough valid non-empty records among the first records, but an executable or other non-utmp file can contain records that pass the invariants by chance, causing the parser to claim it and emit records with out-of-range values. Require the first record to be a valid record for the candidate layout as well; a non-utmp file typically does not start with one. This keeps the parser from claiming binaries (regression seen in the ext4_with_binaries end-to-end test). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
utmp from big-endian system Full file https://github.com/libyal/dtformats/blob/main/test_data/utmp-s390 |
The libc6 utmp format is native-endian, so on a big-endian system such as s390x the record integers are big-endian and ut_type is a 16-bit value in the high bytes of the field. Reading such a file with a little-endian layout misreads every integer. Add a linux_libc6_utmp_entry_64bit_bigendian structure and include it in the record-layout detection, which selects it by validating records the same way as the little-endian layouts. Adopt the dtformats utmp specimens (Apache-2.0) as test data: replace the self-generated utmp_aarch64 with the canonical one and add utmp_x86_64 (little-endian 384-byte) and utmp_s390 (big-endian). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks Joachim, for your assist here as well as the speciman set and test data. Reworked this along the lines you suggested -- it validates the first records to pick the layout and decide whether to claim the file. So it now reads 32-bit, 64-bit and big-endian (s390x) utmp, and won't claim non-utmp files. Test data is your dtformats specimens, including utmp-s390 for the big-endian case. |
|
Much appreciated, will try to take review the changes shortly. |
Description
The libc6 utmp parser's per-record loop did
breakon the firstParseError, abandoning every record after a single corrupt one. Because libc6 utmp/wtmp/btmp records are a fixed 384-byte size, a corrupt record can be skipped and the rest recovered: on aParseError, if a full record could still follow, emit a warning and seek to the next 384-byte boundary instead of stopping; trailing data shorter than a record is still treated as end-of-file. The first record is still validated strictly (a non-utmp first record still raisesWrongParser), and the loop cannot spin (the offset strictly increases).Related issue: fixes #5111
Testing
Added
testParseCorruptUtmpFilewith a synthetic, self-authoredtest_data/utmp_corrupted: a valid record, two consecutive unsupported-type records, a valid record, then a truncated trailing record → 2 events + 2 warnings, proving the trailing record is recovered after the skipped ones and the truncated tail is treated as end-of-file. Expected values were derived fromutmpdump. ExistingtestParseUtmpFile/testParseWtmpFileare unchanged; Black + pylint clean.Open question for the reviewer
Is skip-and-continue acceptable, or do you prefer the current conservative
break? The change is isolated to the loop's exception handler and can be dropped if you'd rather keep the existing behaviour.Checklist
test_data/utmp_corruptedis synthetic and self-authored; no external source, so no ACKNOWLEDGEMENTS entry is needed.)