64-byte header + 11 data sections. All sections are 64-byte aligned. Offsets are computed from counts.
- Alignment: Sections start on 64-byte boundaries; internal structures align to natural size (2/4/8 bytes)
- Sequential: Fixed order for single-pass writing
- Endianness: Little-endian throughout
- Limits: All indices u16 (max 65,535). Transitions: 512 KB max. Use
Callto share patterns.
| Type | Description |
|---|---|
StepId (u16) |
8-byte step index in Transitions |
StringId (u16) |
String Table index |
TypeId (u16) |
Type Definition index |
NodeTypeId (u16) |
Tree-sitter node type ID |
NodeFieldId (u16) |
Tree-sitter field ID |
RegexId (u16) |
Regex Table index |
Sections appear in fixed order, each starting on a 64-byte boundary:
| # | Section | Record Size | Count Source |
|---|---|---|---|
| 0 | Header | 64 bytes | (fixed) |
| 1 | StringBlob | 1 | str_blob_size |
| 2 | RegexBlob | 1 | regex_blob_size |
| 3 | StringTable | 4 | str_table_count + 1 |
| 4 | RegexTable | 8 | regex_table_count + 1 |
| 5 | NodeTypes | 4 | node_types_count |
| 6 | NodeFields | 4 | node_fields_count |
| 7 | TypeDefs | 4 | type_defs_count |
| 8 | TypeMembers | 4 | type_members_count |
| 9 | TypeNames | 4 | type_names_count |
| 10 | Entrypoints | 8 | entrypoints_count |
| 11 | Transitions | 8 | transitions_count |
StringTable and RegexTable use count + 1 entries. The final entry stores the blob size, enabling O(1) length calculation: length[i] = table[i+1] - table[i].
Section offsets are not stored in the header. Loaders compute them by:
- Start after header (offset 64)
- For each section in order:
- Current offset = previous section end, rounded up to 64-byte boundary
- Section size = count × record size (or explicit size for blobs)
- Blob sizes come from header:
str_blob_sizeandregex_blob_size
#[repr(C, align(64))]
struct Header {
// Bytes 0-23: Identity and sizes (6 × u32)
magic: [u8; 4], // b"PTKQ"
version: u32, // 6
checksum: u32, // CRC32 of everything after header
total_size: u32,
str_blob_size: u32,
regex_blob_size: u32,
// Bytes 24-41: Element counts (9 × u16) — order matches section order
str_table_count: u16,
regex_table_count: u16,
node_types_count: u16,
node_fields_count: u16,
type_defs_count: u16,
type_members_count: u16,
type_names_count: u16,
entrypoints_count: u16,
transitions_count: u16,
// Bytes 42-63: Reserved
_reserved: [u8; 22],
}Module::load rejects malformed input instead of failing open. A loaded module
is guaranteed not to panic on later view/decode access — for any input that
passes these checks, including a deliberately forged module whose CRC was
recomputed over crafted bytes. The CRC is not a MAC, so the structural checks
(steps 5–11), not the checksum, are what uphold the no-panic guarantee.
Validation, in order:
- Magic / version / size —
PTKQ, version 6, andtotal_sizeequal to the byte length. - Reserved bytes — bytes 42–63 must be zero (the checksum does not cover the header, so these are checked explicitly).
- Section bounds — the section layout is recomputed in 64-bit arithmetic; the
Transitions section (and therefore every earlier section) must fit within
total_size. - Checksum — CRC32 of everything after the 64-byte header must equal
checksum. This catches accidental corruption of the body. - Table sentinels — String and Regex offset tables must be non-decreasing and end exactly at their blob length; string slices must be valid UTF-8.
- Regex DFAs — every real regex entry's serialized sparse DFA must deserialize, so the VM's per-evaluation deserialize is a sound invariant.
- TypeDefs — each kind byte must be known, and every Struct/Enum member range
(
data + count) must stay withintype_members_count. - String IDs — every required embedded
StringId(entrypoint, node/field symbol, type, member, and regex pattern names) must address a real string-table entry (1..str_table_count), so theNonZeroU16accessors never panic. - Transitions — the instruction stream is walked twice. Pass 1 decodes each
instruction's fixed-size slot, validating opcode, segment, nav, node kind,
effect opcodes,
Set/Enummember operands, and predicate operands, and rejecting any zero successor; it records each instruction start and must tile the section exactly. Pass 2 requires every jump target (successor, call next/target, trampoline next) to land on a recorded instruction start. This makes every lazydecode_step/ view / materializer access panic-free. - Entrypoints — each
targetmust land on a recorded instruction start (not merely in range — an entrypoint into the interior of a multi-step instruction would start decoding mid-instruction) andresult_typemust address a real TypeDef. - Effect stack — an interprocedural walk of the committed-effect order
(across
Call/Return/Trampoline, under the suppression filter) proves no path can drive the materializer's builder stack (Push/Set/EndArr/EndObj/EndEnum) or the VM's suppression counter into a panic. This closes the last forged-module panic class — the materializer's builder-stack panics and the VM'sSuppressEndunderflow — that decode-level checks cannot see.