Problem
The compiler has no peephole, common-subexpression, or minimization pass. So several kinds of redundant instructions end up in the linked bytecode. This costs bytecode size and per-step dispatch time. There is no correctness impact.
The redundant kinds are:
- Effectless single-successor epsilons (pure jumps) left between an effectful epsilon and a real instruction. 8 committed emit snapshots show this (e.g.
sequences_in_quantifier: 10 -ε- → 11 reached only from 16 -ε- [EndObj Push] → …, 10).
- Effect-carrying epsilons around a
Call not coalesced. Example: [Enum][Obj] Call [EndObj][EndEnum][Set] stays as five epsilons where two would do (the two pre-effects and the three post-effects each stay well under the 7-effect cap).
- Byte-identical duplicate position-search nav states in every search-nav star/plus. The repeat search's
navigate and retry are both Match(Next, wildcard) → try (e.g. compile_quantified.snap steps 16 and 18 are identical).
Cause
No pass in compile/ does structural dedup or epsilon coalescing. The pipeline runs four passes only: eliminate_epsilons, remove_unreachable, collapse_up, lower. None of them coalesce or dedup:
compile/epsilon_elim.rs does see-through / forward-migrate / laser-vision / expand-branching only. laser_vision (:225-229) rewrites successors of non-epsilon Matches only. forward_migrate (:142-144) refuses to push into an epsilon successor. expand_branching_epsilons touches effectless branching epsilons only.
compile/dce.rs removes unreachable instructions only.
compile/collapse_up.rs merges Up chains only.
CallIR carries no pre/post effects, so scope brackets must ride separate epsilons (compile/expressions.rs:217-254, compile/scope.rs:272-291).
compile/quantifier.rs:437-459 + compile/scope.rs:504-516 emit both position-search states unconditionally.
Fix
Add an instruction-level cleanup, e.g.:
- An epsilon-coalescing step that fuses
eps1[effects1] → eps2[effects2] (single edge) into one epsilon, as long as the combined effect count stays ≤ 7 and there is no Node/Text reordering hazard.
- A step that redirects predecessors of an effectless single-successor epsilon straight to its successor, then lets DCE drop the epsilon.
- A hash-cons/peephole dedup that merges
Match instructions equal in (nav, node_type, field, pre/post effects, neg_fields, predicate, successors-mod-self-loop), rewriting predecessors to the survivor before DCE. This subsumes the duplicate nav states.
Acceptance
- The cited snapshots lose their pure-jump epsilons, around-
Call epsilon chains, and duplicate nav states.
- All conformance and emit/compile snapshots stay semantically green (the
run_verified fingerprint check and the load-time verifier hold).
Related
Same lowering, adjacent minor non-optimality: compile/lower.rs:57-63 spills all pre_effects to an epsilon chain on overflow, instead of keeping 7 on the match (the way the post_effects drain does at :73-76). That costs one redundant epsilon per overflowing merge-alternation branch.
Problem
The compiler has no peephole, common-subexpression, or minimization pass. So several kinds of redundant instructions end up in the linked bytecode. This costs bytecode size and per-step dispatch time. There is no correctness impact.
The redundant kinds are:
sequences_in_quantifier:10 -ε- → 11reached only from16 -ε- [EndObj Push] → …, 10).Callnot coalesced. Example:[Enum][Obj] Call [EndObj][EndEnum][Set]stays as five epsilons where two would do (the two pre-effects and the three post-effects each stay well under the 7-effect cap).navigateandretryare bothMatch(Next, wildcard) → try(e.g.compile_quantified.snapsteps 16 and 18 are identical).Cause
No pass in
compile/does structural dedup or epsilon coalescing. The pipeline runs four passes only:eliminate_epsilons,remove_unreachable,collapse_up,lower. None of them coalesce or dedup:compile/epsilon_elim.rsdoes see-through / forward-migrate / laser-vision / expand-branching only.laser_vision(:225-229) rewrites successors of non-epsilon Matches only.forward_migrate(:142-144) refuses to push into an epsilon successor.expand_branching_epsilonstouches effectless branching epsilons only.compile/dce.rsremoves unreachable instructions only.compile/collapse_up.rsmergesUpchains only.CallIRcarries no pre/post effects, so scope brackets must ride separate epsilons (compile/expressions.rs:217-254,compile/scope.rs:272-291).compile/quantifier.rs:437-459+compile/scope.rs:504-516emit both position-search states unconditionally.Fix
Add an instruction-level cleanup, e.g.:
eps1[effects1] → eps2[effects2](single edge) into one epsilon, as long as the combined effect count stays ≤ 7 and there is noNode/Textreordering hazard.Matchinstructions equal in(nav, node_type, field, pre/post effects, neg_fields, predicate, successors-mod-self-loop), rewriting predecessors to the survivor before DCE. This subsumes the duplicate nav states.Acceptance
Callepsilon chains, and duplicate nav states.run_verifiedfingerprint check and the load-time verifier hold).Related
Same lowering, adjacent minor non-optimality:
compile/lower.rs:57-63spills allpre_effectsto an epsilon chain on overflow, instead of keeping 7 on the match (the way thepost_effectsdrain does at:73-76). That costs one redundant epsilon per overflowing merge-alternation branch.