Skip to content

Latest commit

 

History

History
74 lines (52 loc) · 2.04 KB

File metadata and controls

74 lines (52 loc) · 2.04 KB

Binary Format: Symbols

Symbol tables map external tree-sitter IDs to internal string names.

1. Regex

Precompiled DFA patterns for predicate matching. Uses the sentinel pattern like StringTable.

RegexBlob

  • Section Offset: Computed (follows StringBlob)
  • Size: header.regex_blob_size

Contains concatenated serialized DFAs (from regex-automata). Each DFA is deserialized once via DFA::from_bytes() when the module loads — that pass both validates the automaton and caches it as an owned DFA, so predicate evaluation reuses the parsed automaton instead of re-deserializing per match (issue #426).

RegexTable

  • Section Offset: Computed (follows StringTable)
  • Record Size: 8 bytes
  • Count: header.regex_table_count + 1

Each entry stores both a StringId (for pattern display) and offset into RegexBlob (for DFA access):

#[repr(C)]
struct RegexEntry {
    string_id: u16,     // StringId of pattern text (for dump/trace)
    reserved: u16,      // Reserved for future use; must be zero, rejected at load
    offset: u32,        // Byte offset into RegexBlob
}

The final entry is a sentinel with string_id = 0 and offset = blob_size.

To retrieve regex i:

  1. pattern_id = table[i].string_id → look up in StringTable for display
  2. start = table[i].offset
  3. end = table[i+1].offset
  4. dfa_bytes = blob[start..end]

2. Node Types

Maps tree-sitter node type IDs to their string names.

  • Section Offset: Computed (follows RegexTable)
  • Record Size: 4 bytes
  • Count: header.node_types_count
#[repr(C)]
struct NodeSymbol {
    id: u16,        // Tree-sitter node type ID
    name: u16,      // StringId
}

This table enables name lookup for debugging and error messages.

3. Node Fields

Maps tree-sitter field IDs to their string names.

  • Section Offset: Computed (follows NodeTypes)
  • Record Size: 4 bytes
  • Count: header.node_fields_count
#[repr(C)]
struct FieldSymbol {
    id: u16,        // Tree-sitter field ID
    name: u16,      // StringId
}