Skip to content

[hdf5_2_1_1] backport #6395: validate VL datatype during decode + check file pointer in H5T_set_loc#6426

Open
dkgkdfg65 wants to merge 1 commit into
HDFGroup:hdf5_2_1_1from
dkgkdfg65:backport/vlen-validate-2_1_1
Open

[hdf5_2_1_1] backport #6395: validate VL datatype during decode + check file pointer in H5T_set_loc#6426
dkgkdfg65 wants to merge 1 commit into
HDFGroup:hdf5_2_1_1from
dkgkdfg65:backport/vlen-validate-2_1_1

Conversation

@dkgkdfg65
Copy link
Copy Markdown

hdf5_2_1_1 doesn't have the VL-datatype decode validation that landed on develop (#6395, 2026-05-28 — after this branch's tip). On hdf5_2_1_1, H5O__dtype_decode_helper accepts a variable-length datatype whose type nibble is neither SEQUENCE nor STRING, and H5T_set_loc's memory branch then falls through leaving the vlen class unset, so a later deref hits a NULL pointer (and an assert(0) sink). A crafted .h5 file reaches it.

checked it on hdf5_2_1_1 rather than just diffing: built the decode -> H5T__vlen_set_loc path as a release-like (-DNDEBUG) harness with -fsanitize=address,undefined on ubuntu:22.04, fed a vlen type with nibble 0x0a. Pre-fix: UBSan member-access-within-null-pointer then ASan SEGV on 0x0. With #6395 cherry-picked the decode rejects the bad type and it's clean.

(yes, .h5 files are a trust boundary in many embedders — flagging it as a real decode crash rather than just a fuzzer artifact; it was originally an OSS-Fuzz find.)

Clean cherry-pick (-x), original author (tbeu) preserved. Two files (H5Odtype.c, H5T.c). Glad to rebase if you'd prefer.

upstream: 3fa6ed6

…_set_loc (HDFGroup#6395)

H5O__dtype_decode_helper() reads vlen.type from the file without
validation. With corrupted HDF5 files (e.g. from fuzzing), this field
can have an invalid value that is neither H5T_VLEN_SEQUENCE nor
H5T_VLEN_STRING, which later triggers assert(0) in H5T__vlen_set_loc()
(debug builds) or a NULL pointer dereference / SEGV in release builds.

Fix by:
1. Adding a validation check in H5O__dtype_decode_helper() immediately
   after reading the vlen.type field, returning an error if the value
   is invalid.
2. Adding a NULL file pointer check in H5T_set_loc() before calling
   H5T__vlen_set_loc() when loc == H5T_LOC_DISK, so the low-level
   assert(file) invariant is never violated.

This fixes the root cause at the decode level where the bad value
enters the system, as requested in review of HDFGroup#6378 and HDFGroup#6385.

Found by OSS-Fuzz via the matio fuzzer (ClusterFuzz testcase
5366895365914624).

(cherry picked from commit 3fa6ed6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To be triaged

Development

Successfully merging this pull request may close these issues.

2 participants