Skip to content

No instruction-coalescing/dedup pass: redundant epsilons and duplicate nav states survive into bytecode #475

@zharinov

Description

@zharinov

Problem

The compiler has no peephole, common-subexpression, or minimization pass. So several kinds of redundant instructions end up in the linked bytecode. This costs bytecode size and per-step dispatch time. There is no correctness impact.

The redundant kinds are:

  • Effectless single-successor epsilons (pure jumps) left between an effectful epsilon and a real instruction. 8 committed emit snapshots show this (e.g. sequences_in_quantifier: 10 -ε- → 11 reached only from 16 -ε- [EndObj Push] → …, 10).
  • Effect-carrying epsilons around a Call not coalesced. Example: [Enum][Obj] Call [EndObj][EndEnum][Set] stays as five epsilons where two would do (the two pre-effects and the three post-effects each stay well under the 7-effect cap).
  • Byte-identical duplicate position-search nav states in every search-nav star/plus. The repeat search's navigate and retry are both Match(Next, wildcard) → try (e.g. compile_quantified.snap steps 16 and 18 are identical).

Cause

No pass in compile/ does structural dedup or epsilon coalescing. The pipeline runs four passes only: eliminate_epsilons, remove_unreachable, collapse_up, lower. None of them coalesce or dedup:

  • compile/epsilon_elim.rs does see-through / forward-migrate / laser-vision / expand-branching only. laser_vision (:225-229) rewrites successors of non-epsilon Matches only. forward_migrate (:142-144) refuses to push into an epsilon successor. expand_branching_epsilons touches effectless branching epsilons only.
  • compile/dce.rs removes unreachable instructions only.
  • compile/collapse_up.rs merges Up chains only.
  • CallIR carries no pre/post effects, so scope brackets must ride separate epsilons (compile/expressions.rs:217-254, compile/scope.rs:272-291).
  • compile/quantifier.rs:437-459 + compile/scope.rs:504-516 emit both position-search states unconditionally.

Fix

Add an instruction-level cleanup, e.g.:

  • An epsilon-coalescing step that fuses eps1[effects1] → eps2[effects2] (single edge) into one epsilon, as long as the combined effect count stays ≤ 7 and there is no Node/Text reordering hazard.
  • A step that redirects predecessors of an effectless single-successor epsilon straight to its successor, then lets DCE drop the epsilon.
  • A hash-cons/peephole dedup that merges Match instructions equal in (nav, node_type, field, pre/post effects, neg_fields, predicate, successors-mod-self-loop), rewriting predecessors to the survivor before DCE. This subsumes the duplicate nav states.

Acceptance

  • The cited snapshots lose their pure-jump epsilons, around-Call epsilon chains, and duplicate nav states.
  • All conformance and emit/compile snapshots stay semantically green (the run_verified fingerprint check and the load-time verifier hold).

Related

Same lowering, adjacent minor non-optimality: compile/lower.rs:57-63 spills all pre_effects to an epsilon chain on overflow, instead of keeping 7 on the match (the way the post_effects drain does at :73-76). That costs one redundant epsilon per overflowing merge-alternation branch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions