Problem
The structural-check pass (#443) catches what can be inside what. It is blind to order and adjacency: it cannot reject patterns that demand an impossible sequence, a strict adjacency the grammar never produces, or an impossible quantified/anchored shape.
Example: (function_declaration .! (identifier)) — .! demands the identifier be the literal first child, but a function always begins with "async"/"function". Impossible, yet only a sequence-aware analysis sees it. The soft . variant is satisfiable (it skips anonymous tokens), so the checker must never reject it — false rejections are the one failure mode this feature must not have.
Construction
The set of valid visible child sequences is not regular for self-embedding grammars (balanced-paren shapes; 19/143 surveyed grammars — dart, devicetree, swift, sql, tree-sitter's own query language, …). So don't materialize the language. Materialize only whether the hidden structure can carry the query's child automaton between two states — a finite state-pair domain even when the language isn't regular.
A least-fixed-point over two mutually-recursive predicates (Knaster–Tarski; start everything false, iterate to stability):
SAT(p, v) — query pattern p's structure is realizable by a node built from grammar variable v.
THREAD(p, h, q, q') — the visible frontier of hidden variable h can drive automaton A_p from state q to q'.
No search bounds, no give-up paths — termination comes from finite judgment domains, not budgets. If an implementation reaches for a depth limit, it's being built wrong.
Inputs — StructureTable
Distill from SyntaxGrammar before it is discarded. Grammar::from_raw (crates/plotnik-core/src/grammar/types.rs:127-154) builds the flattened, repeat-free Production/ProductionStep ({symbol, alias, field_name}, prepared.rs) to derive node-shapes, then drops it. Retain a distilled table: per variable, productions as step sequences with (effective kind, named-ness, visibility class ∈ {visible, hidden-subtree, hidden-leaf}, field); plus kind→inhabiting-variables (incl. aliased occurrences), supertype→subtype closure, and extra kinds.
Query side — A_p
Per node pattern, a small NFA over its child list with pattern edges (a child subpattern) and gap edges (self-loops labeled with a node-class set). Derive the gap classes from anchor context by reusing compute_nav_modes (crates/plotnik-compiler/src/compile/navigation.rs) so checker and codegen can't drift:
| Context |
Gap admits |
| no anchor |
any node |
. both operands named |
anonymous + extras |
. an anonymous operand |
extras only |
.! |
nothing |
Quantifiers → loops (no collapsing); alternation branches → per-branch entry classes; boundary anchors → entry/exit gap class.
Two implementation cruxes (not pinned by the design note)
NodeTypeId ↔ variable/Symbol bridge. The query side resolves kinds to NodeTypeId (the link.rs world); SAT is keyed by variable deliberately — that's the fix for tree-sitter's alias blind spot (e.g. typescript member_expression inside typeof types is built by the aliased _type_query_member_expression, whose internals differ). The bridge must keep both spaces consistent: kind for edge labels, variable for the recursion.
StructureTable shape + visibility classification — recovering token-ness (hidden-leaf) per ProductionStep.symbol post-flatten, and exposing the table cross-crate from plotnik-core.
Why exact w.r.t. the grammar
Vertical: subtree sets of a CFG depend only on the producing variable, not context → per-level decomposition keyed by variable is lossless. Horizontal: threading is the product of production structure with A_p; hidden recursion of any shape (R → R R, balanced parens, mutual recursion) just adds THREAD facts over the same finite state-pair domain.
How this beats tree-sitter (verified against tmp/tree-sitter/lib/src/query.c)
Tree-sitter solves the same problem in ts_query__analyze_patterns / ts_query__perform_analysis but gives up: stack capped at MAX_ANALYSIS_STATE_DEPTH = 8 (query.c:25), iterations at MAX_ANALYSIS_ITERATION_COUNT = 256 (query.c:26); on exceeding either it sets did_abort and the pattern is assumed valid (query.c:1915-1928). It also skips wildcard-rooted patterns with children (parent_step_indices excludes wildcards) and keys nesting by kind, not producing variable. This construction has none of those blind spots. ERROR handling matches tree-sitter's stance: ERROR query steps are always matchable, patterns rooted at ERROR skip structural validation (cf. query.c:1473-1476).
Approximation ledger (all err toward accepting; rejection stays sound)
precedence / associativity / conflicts, dynamic precedence / GLR, external scanners, reserved-word / lexical shadowing, extras placement, string/regex predicates (opaque), ERROR / MISSING (out of scope) — each line accepts more, never rejects more.
Acceptance
(function_declaration .! (identifier)) rejected; (function_declaration . (identifier)) accepted.
- Self-embedding grammars (devicetree balanced parens) terminate with no depth limit.
- Recursive named definitions (
Nested = (call (Nested))) handled by the query-side LFP.
- Zero false rejections on the valid-query corpus.
- Diagnostic points at the deepest failing pattern node (e.g. "the first child of
function_declaration is async or function, never identifier; the soft anchor . skips anonymous tokens").
Notes
Diagnostics fragments come from #430. A later literal-vs-literal predicate-evaluation pass (precedence-aware filtering) is an optional precision step — only if real queries miss it.
Sizing (design note): ~400 LOC table + ~700 engine + ~300 diagnostics + tests.
Problem
The structural-check pass (#443) catches what can be inside what. It is blind to order and adjacency: it cannot reject patterns that demand an impossible sequence, a strict adjacency the grammar never produces, or an impossible quantified/anchored shape.
Example:
(function_declaration .! (identifier))—.!demands the identifier be the literal first child, but a function always begins with"async"/"function". Impossible, yet only a sequence-aware analysis sees it. The soft.variant is satisfiable (it skips anonymous tokens), so the checker must never reject it — false rejections are the one failure mode this feature must not have.Construction
The set of valid visible child sequences is not regular for self-embedding grammars (balanced-paren shapes; 19/143 surveyed grammars — dart, devicetree, swift, sql, tree-sitter's own query language, …). So don't materialize the language. Materialize only whether the hidden structure can carry the query's child automaton between two states — a finite state-pair domain even when the language isn't regular.
A least-fixed-point over two mutually-recursive predicates (Knaster–Tarski; start everything
false, iterate to stability):SAT(p, v)— query patternp's structure is realizable by a node built from grammar variablev.THREAD(p, h, q, q')— the visible frontier of hidden variablehcan drive automatonA_pfrom stateqtoq'.No search bounds, no give-up paths — termination comes from finite judgment domains, not budgets. If an implementation reaches for a depth limit, it's being built wrong.
Inputs —
StructureTableDistill from
SyntaxGrammarbefore it is discarded.Grammar::from_raw(crates/plotnik-core/src/grammar/types.rs:127-154) builds the flattened, repeat-freeProduction/ProductionStep({symbol, alias, field_name},prepared.rs) to derive node-shapes, then drops it. Retain a distilled table: per variable, productions as step sequences with(effective kind, named-ness, visibility class ∈ {visible, hidden-subtree, hidden-leaf}, field); plus kind→inhabiting-variables (incl. aliased occurrences), supertype→subtype closure, and extra kinds.Query side —
A_pPer node pattern, a small NFA over its child list with pattern edges (a child subpattern) and gap edges (self-loops labeled with a node-class set). Derive the gap classes from anchor context by reusing
compute_nav_modes(crates/plotnik-compiler/src/compile/navigation.rs) so checker and codegen can't drift:.both operands named.an anonymous operand.!Quantifiers → loops (no collapsing); alternation branches → per-branch entry classes; boundary anchors → entry/exit gap class.
Two implementation cruxes (not pinned by the design note)
NodeTypeId↔ variable/Symbolbridge. The query side resolves kinds toNodeTypeId(thelink.rsworld);SATis keyed by variable deliberately — that's the fix for tree-sitter's alias blind spot (e.g. typescriptmember_expressioninsidetypeoftypes is built by the aliased_type_query_member_expression, whose internals differ). The bridge must keep both spaces consistent: kind for edge labels, variable for the recursion.StructureTableshape + visibility classification — recovering token-ness (hidden-leaf) perProductionStep.symbolpost-flatten, and exposing the table cross-crate from plotnik-core.Why exact w.r.t. the grammar
Vertical: subtree sets of a CFG depend only on the producing variable, not context → per-level decomposition keyed by variable is lossless. Horizontal: threading is the product of production structure with
A_p; hidden recursion of any shape (R → R R, balanced parens, mutual recursion) just addsTHREADfacts over the same finite state-pair domain.How this beats tree-sitter (verified against
tmp/tree-sitter/lib/src/query.c)Tree-sitter solves the same problem in
ts_query__analyze_patterns/ts_query__perform_analysisbut gives up: stack capped atMAX_ANALYSIS_STATE_DEPTH = 8(query.c:25), iterations atMAX_ANALYSIS_ITERATION_COUNT = 256(query.c:26); on exceeding either it setsdid_abortand the pattern is assumed valid (query.c:1915-1928). It also skips wildcard-rooted patterns with children (parent_step_indicesexcludes wildcards) and keys nesting by kind, not producing variable. This construction has none of those blind spots. ERROR handling matches tree-sitter's stance: ERROR query steps are always matchable, patterns rooted at ERROR skip structural validation (cf. query.c:1473-1476).Approximation ledger (all err toward accepting; rejection stays sound)
precedence / associativity / conflicts, dynamic precedence / GLR, external scanners, reserved-word / lexical shadowing, extras placement, string/regex predicates (opaque), ERROR / MISSING (out of scope) — each line accepts more, never rejects more.
Acceptance
(function_declaration .! (identifier))rejected;(function_declaration . (identifier))accepted.Nested = (call (Nested))) handled by the query-side LFP.function_declarationisasyncorfunction, neveridentifier; the soft anchor.skips anonymous tokens").Notes
Diagnostics fragments come from #430. A later literal-vs-literal predicate-evaluation pass (precedence-aware filtering) is an optional precision step — only if real queries miss it.
Sizing (design note): ~400 LOC table + ~700 engine + ~300 diagnostics + tests.