Skip to content

Transformation bugfixes#2385

Draft
ThrudPrimrose wants to merge 36 commits into
mainfrom
main-bugfixes
Draft

Transformation bugfixes#2385
ThrudPrimrose wants to merge 36 commits into
mainfrom
main-bugfixes

Conversation

@ThrudPrimrose
Copy link
Copy Markdown
Collaborator

@ThrudPrimrose ThrudPrimrose commented May 31, 2026

Bugfixes

While working on the canonicalization pipeline and ensuring that we can automatically parallelize CloudSC, I have encountered many bugs and fixed many of them.


Core SDFG Files

symbols_defined_at does not include enclosing LoopRegion vars

  • Where: dace/sdfg/state.py::ControlFlowBlock.symbols_defined_at.
  • How to reproduce: Query symbols_defined_at on any node inside a LoopRegion body.
  • Pre-fix: The enclosing loop variable (e.g. i) is missing from the returned set.
  • Post-fix: The loop variable is reported as defined.
  • Downstream effect: Memlet propagation's widening fallback consults symbols_defined_at; with i absent it treats per-iteration subsets like arr[*, jk-1, *] as un-parameterised and widens them to whole-array unions, which then blocks LoopToMap.
  • Fix: Walk up considers LoopRegions' smybols too.
  • Tests: tests/sdfg/symbols_defined_at_test.py.

Output-array arg dropped when outgoing memlet from MapExit is source-relative

  • Where: dace/sdfg/state.py::DataflowGraphView.unordered_arglist, the AccessNode/CodeNode -> ExitNode branch.
  • How to reproduce: Build a map whose outgoing edge from MapExit to an outer AccessNode (D) carries a memlet whose data field still names an inner transient (tmp) rather than D. (This can appear because of oterh_subset copies etc.).
  • Pre-fix: D is missing from the arglist. The code took oedge.data.data == 'tmp'; tmp is already in descs (the AccessNode tmp is in-scope), so nothing reached additional_descs.
  • Post-fix: D (and its stride/shape symbols) is in the arglist.
  • Fix: Resolve the destination from the graph -- memlet_path(oedge)[-1].dst (equivalently memlet_tree(oedge).root().edge.dst) -- and use that node's .data, with oedge.data.data as fallback when the path does not terminate at an AccessNode.
  • Tests: New focused unit test tests/sdfg/exit_arglist_test.py::test_arglist_resolves_outer_destination_from_source_relative_outgoing_memlet. End-to-end GPU compile of the same shape: tests/codegen/argument_signature_test.py::test_argument_signature_test.

Veclen lookup crashed on non-AccessNode endpoints

  • Where: dace/sdfg/validation.py::validate_state, the edge dimensionality check + the two View-exception branches.
  • How to reproduce: Construct any edge with other_subset set where one endpoint is a scope node (NestedSDFG / MapEntry / MapExit / ConsumeEntry / ConsumeExit), then call sdfg.validate().
  • Pre-fix: AttributeError: 'NestedSDFG' object has no attribute 'data' raised from sdfg.arrays[src_node.data].veclen. The pre-fix code assumed both endpoints were AccessNodes; scope nodes route data through connectors and have no .data field.
  • Post-fix: Validation reaches a verdict.
  • Fix: Read each side's veclen only when its endpoint is an AccessNode; default to 1 for scope nodes. The same guard applied to the two View-exception branches downstream.
  • Tests: tests/sdfg/validation/subset_size_test.py::test_veclen_lookup_guarded_on_non_accessnode_endpoint.

Unused transient's shape symbol leaked into signature (#2382)

  • Where: dace/sdfg/sdfg.py::SDFG._used_symbols_internal.
  • How to reproduce: Build two identical SDFGs differing only by one declaring an extra unused transient whose shape uses a symbol (e.g. x_shape). Compare used_symbols(all_symbols=False) and arglist().
  • Pre-fix: The SDFG with the unused transient gets x_shape added to both its used-symbols set and its arglist -- merely declaring an unused array changes the SDFG's signature.
  • Post-fix: Both SDFGs report identical used_symbols and arglist.
  • Fix: Gate the extent-symbol expansion on read_and_write_sets() rather than on every declared array.
  • Tests: tests/sdfg/free_symbols_test.py::test_unused_array_does_not_leak_shape_symbol.

Symbolic

sympy_numeric_fix demoted integer-valued floats to int

The previous merge to main fixed the error on roundtrip but, still it appeared on CloudSC coming from Python frontend.

  • Where: dace/symbolic.py::sympy_numeric_fix (plus a consolidation of _print_Float / _format_float).
  • How to reproduce: Call sympy_numeric_fix on a finite Python / numpy / sympy float like 1.0; then construct a sympy.Min of that value and a symbol.
  • Pre-fix: 1.0 is returned as Python int(1); Min(zanew, 1) is a type-mismatched int-Min that SymPy re-canonicalises through the integer branch and silently truncates downstream.
  • Post-fix: 1.0 is returned as sympy.Float(1.0); Min(zanew, 1.0) preserves float typing end-to-end.
  • Fix: Preserve any finite Python / numpy / sympy float as sympy.Float. Also fixes inf.0 and 17-digit Float print bugs surfaced by the same path.
  • Tests: tests/symbolic_print_test.py + tests/symbolic_roundtrip_test.py.

_print_floor emitted a literal floor(...) instead of integer division in cpp_mode

I guess users should never use // and use int_floor, but this fix should help an unaware user from getting incorrect CPP without any warning or error.

  • Where: dace/symbolic.py::DaceSympyPrinter._print_floor.
  • How to reproduce: Form (LEN - 1) // 8 on an integer symbol LEN (sympy normalises this to floor(LEN/8 - 1/8)), then print with DaceSympyPrinter(cpp_mode=True).
  • Pre-fix: The printer emits the literal floor(LEN/8 - 1/8). C++ integer-divides 1 / 8 to 0, so the bound effectively collapses to LEN / 8 and the loop overshoots by one.
  • Post-fix: The printer emits ((LEN - 1) / (8)) -- a single integer division.
  • Fix: Detect the common-denominator fraction-difference via arg.together().as_numer_denom() and emit a single integer-division pair.
  • Tests: tests/symbolic_print_test.py::test_cpp_floor_of_fraction_difference_recombines_to_integer_division -- asserts the cpp output contains no floor( and no 1/8.

Transformations

TrivialTaskletElimination misoriented the surviving memlet when source was a MapEntry

  • Where: dace/transformation/dataflow/trivial_tasklet_elimination.py, the expr_index=1 (MapEntry-source) branch.
  • What the pass does: Eliminates a passthrough copy tasklet by folding read --in_edge--> tasklet --out_edge--> write into a single replacement edge from read to write. The replacement memlet must describe the side that owns the connector the edge actually leaves through.
  • How to reproduce: Build a passthrough copy tasklet whose source is a MapEntry reading an offset slice (e.g. a[i + 1]) and whose destination is a transient scalar (a_idx[0]). Apply TrivialTaskletElimination; inspect the surviving edge's .data and .subset.
  • Pre-fix: Surviving edge has data='a_idx', subset='0', other_subset='i + 1' -- the pass reused the write-side memlet and stuffed the read offset into other_subset. The SDFG validates and runs correctly because the dataflow is still consistent across the two subsets.
  • Post-fix: Surviving edge has data='a', subset='i + 1', other_subset='0'.
  • Fix: When expr_index=1 (MapEntry as per the transformation) keep in_edge.data (the read-side memlet) as the surviving memlet, with the write subset moved into other_subset.
  • Tests: tests/transformations/trivial_tasklet_elimination_test.py::test_trivial_tasklet_map_source_preserves_offset_subset -- asserts the surviving edge's .data == 'a' and .subset == 'i + 1'. Pre-fix the test fails with AssertionError: surviving edge must describe the read data 'a', got 'a_idx'.

MapFusionVertical InOut split + matching ConditionFusion parent-fixup

  • Where: dace/transformation/dataflow/map_fusion_vertical.py (_split_inout_for_intermediate) + dace/transformation/interstate/condition_fusion.py (fuse_consecutive_conditions).
  • Symptom: Fusing maps that share a NestedSDFG InOut connector for the intermediate name merged the two sides into one connector and clobbered the upstream feedthrough; the matching parent-block bookkeeping in ConditionFusion.fuse_consecutive_conditions had the analogous clobber.
  • Fix: When an intermediate's data name matches an InOut connector of a NestedSDFG in the producer's body, split the connector inside the NestedSDFG (rename the inner read-side accesses to a fresh array bound to a new input connector) so the standard rename machinery rewires the output-only connector without a mismatched-InOut validation error. Apply the same split-aware bookkeeping in ConditionFusion's parent fixup.
  • Tests: tests/transformations/map_fusion_vertical_test.py.

ConditionFusion: use isinstance(ControlFlowBlock) instead of hasattr(node, 'sdfg')

  • Where: dace/transformation/interstate/condition_fusion.py, the post-rewrite parent-fixup walk in fuse_nested_conditions.
  • Symptom (not directly probed; analysed from the code): The walk does if hasattr(node, 'sdfg'): node.sdfg = parent.sdfg over all_nodes_recursive(). hasattr(node, 'sdfg') is True for NestedSDFG nodes too -- but NestedSDFG.sdfg is the inner SDFG. The assignment overwrites the inner-SDFG slot with the outer container, producing a graph cycle that infinite-recurses all_nodes_recursive on the next walk.
  • Fix: isinstance(node, ControlFlowBlock) matches SDFGState / ControlFlowRegion / ConditionalBlock (whose .sdfg legitimately names the containing SDFG) and skips NestedSDFG. Defensive type tightening matching the same pattern already corrected in fuse_consecutive_conditions.
  • Tests: tests/transformations/interstate/condition_fusion_test.py (existing suite; the specific NestedSDFG-inside-condition shape is not directly exercised here -- this is a defensive same-family tightening rather than a regression-catcher fix).

RedundantArrayCopyingIn folded chains it should have refused

  • How to reproduce (partial copy): Build a four-node chain A -> B -> C -> D where one of the copies covers only part of the array (e.g. [0:2] of a size-4 array). Run the pass.
  • Pre-fix: The pass applies and the output is corrupted: the redirected writers' subsets are preserved, but the partial copy that originally restricted them is gone, so positions the chain never wrote get overwritten.
  • How to reproduce (extra consumer): Build the chain A -> B -> C -> D plus a second consumer C -> E. Run the pass.
  • Pre-fix: C is removed; E is left isolated; validation fails with InvalidSDFGNodeError: Isolated node E.
  • Post-fix (both cases): The pass refuses (applied == 0) and the SDFG stays correct.
  • Fix: Two guards in can_be_applied: out_degree(med_array) == 1 (no second consumer) AND _is_full_copy on both in -> med and med -> out (each side's subset equals Range.from_array(<that side's desc>)). Either failure refuses the fold.
  • Tests: tests/transformations/redundant_copy_test.py::test_in_failure_partial_copy and ::test_in_failure_extra_consumer -- both fail pre-fix with the symptoms above, pass post-fix.

AugAssignToWCR did not detect the copy-wrapped read-modify-write shape

  • Symptom: The frontends lower an in-place update like arr[S] = arr[S] + x as a 4-node chain arr[S] -> copy_in -> tasklet -> copy_out -> arr[S] -- the accumulator slice is materialised into a scalar transient before the combining tasklet and copied back after it. The existing AugAssignToWCR matcher only recognised the direct shape and missed the copy-wrapped one, so loop-carried reductions stayed sequential and could not parallelize via LoopToMap.
  • Fix: A new expr_index=2 matches the chain and rewrites it to a WCR write on arr[S]. Supports +, *, min, max, left-sub.
  • Tests: tests/transformations/wcr_conversion_test.py.

…ables

``SDFGState.symbols_defined_at`` only folded in global symbols, inter-state-edge
assignments and dataflow-scope (Map/Consume) symbols. It never walked up the
control-flow hierarchy, so the loop variable of an enclosing ``LoopRegion``
(e.g. ``jk``) was reported as undefined for a node inside the loop body. That
false "undefined" tripped ``propagate_memlets_nested_sdfg``'s widening fallback
and replaced a nested-SDFG connector dim indexed by the enclosing loop variable
with the whole array (e.g. ``arr[*, jk-1, *]`` -> ``arr[*, 0:klev, *]``), hiding
the per-iteration uniqueness and blocking ``LoopToMap``.

Walk ``self.parent_graph`` up the CFG, collect each enclosing ``LoopRegion``,
and fold in its loop variable via ``LoopRegion.new_symbols(symbols)`` (outermost
first, before the scope-symbol walk). The widening fallback's gate is now
accurate; ``LoopToMap``'s analysis sees the parametric subset.
…t its data

DataflowGraphView.unordered_arglist resolved the external array written through a
MapExit/ExitNode from ``oedge.data.data``. For a source-relative outgoing Memlet
(whose ``data`` names the inner transient, e.g. ``tmp_out``, not the destination
array ``D``) this dropped the real array -- and its stride/shape symbols -- from
the kernel argument list, so GPU codegen emitted a __global__ signature that used
``D``/``second_stride_D`` without declaring them (``identifier "D" is undefined``).

Resolve the destination from the terminal AccessNode of the memlet path (symmetric
to the input-side handling, which already follows the path), falling back to
``oedge.data.data`` when the path does not end at an AccessNode. Destination-relative
output Memlets are unaffected (the terminal AccessNode is that same array).

Fixes argument_signature_test and the nested_kernel_transient tests (verified
serially; the -n4 set-differences were .dacecache build races, all pass serially).
The outgoing Memlet at a scope exit can be source-relative -- naming the
inner transient rather than the external array being written -- so using its
.data dropped the real destination array (and its stride symbols) from a GPU
kernel's argument list, yielding 'identifier undefined' at compile. Resolve
the written array from the memlet tree's root (the outermost-scope node, the
actual fan-out destination) instead, matching the long-standing NOTE here.
validate_state's dimensionality-mismatch check dereferenced
sdfg.arrays[src_node.data].veclen / sdfg.arrays[dst_node.data].veclen
on every edge with other_subset set, assuming both endpoints were
AccessNodes. Any edge whose src/dst is a NestedSDFG / MapEntry /
MapExit / ConsumeEntry / ConsumeExit raised
"AttributeError: 'NestedSDFG' object has no attribute 'data'" because
scope nodes do not expose .data -- they route data via connectors,
and any per-side packing lives on the inner descriptor.

The crash blocked a 2nd/3rd canonicalize() call on the guarded
imperfect-nest repro (a top-level guard above an outer map with a
nested map and an extra-statement tail): the multi-pass pipeline
eventually produces a deeper NSDFG whose inner NSDFG -> MapExit edge
carries both src_subset and dst_subset, and validation aborts before
the pipeline can reach a fixed point.

Fix: read each side's veclen only when its endpoint is an AccessNode;
default to 1 for scope nodes. The two View-exception branches had the
same latent assumption -- guarded them with the same isinstance check.

Closes the canonicalize idempotence crash: with this guard the repro
reaches a stable fixed point (1x->(1,2), 2x=3x=4x=(2,2)) with no
warnings, and the original "_it_X leaks into outer subsets" path is
unaffected (those forms were the intentional WCR running-union
encoding from PR #1176 -- the iterator IS in scope on the inner-out
edge, so the SDFG is well-formed; the only thing missing was a
validator that survives non-AccessNode endpoints).

CORE_BUGFIXES.md: mark #3 (preserve_minima fabrication) as NOT-A-BUG
(intentional WCR semantics); mark #4 (canonicalize idempotence) as
RESOLVED by this fix; add #9 describing the validation guard with the
sweep verification.

Sweep: tests/transformations + tests/canonicalize + propagation +
tests/passes -- 2074 passed, 25 failed (all pre-existing: 5
offset_loop_and_maps TODO-raise, 1 perf_loop_nesting refusal,
1 branch_elimination test-bug, plus environmental cache/import
errors). One fewer failure than baseline (the canonicalize idem path
now survives validation). Zero net regressions.
…de endpoint

validate_state's dimensionality-mismatch check used to dereference
sdfg.arrays[src_node.data].veclen unconditionally, crashing with
"AttributeError: 'NestedSDFG' object has no attribute 'data'" on any
edge with both src/dst subsets where one endpoint was a scope node
(NestedSDFG / MapEntry / MapExit / ConsumeEntry / ConsumeExit).

The new test builds a NestedSDFG-output -> AccessNode edge with a reshape
memlet (which sets other_subset) and asserts validation reaches a
verdict. Verified: reverting the isinstance(src_node, nd.AccessNode)
guard reproduces the AttributeError; restored fix makes the test pass.
… doubles

A Python/numpy float such as the ``1.0`` clamp in ``min(x, 1.0)`` was collapsed
to int ``1`` in ``sympy_numeric_fix`` (only ``sympy.Float`` was spared). The int
then mixed with a double inside a Min/Max and, after a serialization round-trip
re-canonicalised the argument order, truncated the result -- the CloudSC save/load
divergence. Preserve any finite float as a ``sympy.Float`` so its type survives.

Float printing also forced every value through ``float()`` before formatting, so a
near-max double (Fortran ``HUGE``, just over the C double max) overflowed to ``inf``
and rendered as ``inf.0``; and ``%.15g`` truncated values that need 16-17 digits to
round-trip. Fall back to sympy's own shortest decimal when ``float()`` is non-finite,
and to ``repr`` when 15 significant figures do not round-trip. Consolidate both
printers onto ``_format_float``.

CloudSC in-memory / save(compress)->load / save(plain)->load all bit-exact.
Three related fixes uncovered while triaging the bulk-imported TSVC blocks:

1. ``SplitMapForVectorRemainder`` (P2): switch from sympy's ``//``
   operator to ``dace.symbolic.int_floor`` when computing ``main_end``.
   sympy normalises ``(LEN_1D - 1) // 8`` to ``floor(LEN_1D/8 - 1/8)``
   (rewriting the integer-division as a Rational-fraction subtraction)
   which the C++ codegen prints as ``(LEN_1D / 8) - (1 / 8)`` — in C++
   integer division ``1 / 8`` is 0, so the main bound collapses to
   ``LEN_1D / 8`` instead of ``(LEN_1D - 1) / 8`` and the main tile
   loop overruns the kernel's actual range.  TSVC s2244-shape kernels
   (pre-loop scalar write to ``a[LEN_1D - 1]`` + body ``for i in
   range(LEN_1D - 1)``) silently overwrote the pre-loop write because
   of this.

2. ``DaceSympyPrinter._print_floor`` — defensive backstop in case any
   other code path still produces sympy ``floor(...)`` expressions:
   detect the ``floor(a/b - c/b)`` shape via ``arg.together().as_numer_denom()``
   and emit ``((numerator) / (denominator))`` (correct C++ integer
   division).  Falls through to ``floor(...)`` math-library call only
   for genuinely real-valued floors.  Locks in the rule "never use
   ``//`` in the vector backend" — even if a caller forgets and uses
   ``//``, the printer recovers.

3. ``BranchElimination.can_be_applied`` ([line 804, 914]) — two
   debug-print lines referenced ``write.data`` but ``write`` is the
   dataname string from ``state.read_and_write_sets()`` (no ``.data``
   attribute).  The print crashed with ``AttributeError: 'str' object
   has no attribute 'data'`` and propagated out of the ``can_be_applied``
   check, aborting the whole pipeline.  TSVC s1279-shape kernels
   (nested-if with disjoint write set) hit this.  Fixed to print
   ``write`` directly.

Sweep change after these three fixes (just block1, the canary):
- v2 (only the .data print fix): 20 failed → from 27.
- v3 (+ int_floor in P2): 19 failed.
- v4 (+ floor printer + clean cache): 19 failed (the 3 remaining s2244
  failures are ``divides_evenly + LEN=64`` for kernels whose effective
  loop range is ``LEN_1D - 1 = 63`` — divides_evenly contract assumes
  range divisible by W and the test parametrise picks LEN=64 too
  aggressively for these kernels; a test-side LEN selector is the
  right follow-up).
DaceSympyPrinter._print_floor must recombine sympy's common-denominator
fraction sum (e.g. ``floor(LEN/8 - 1/8)`` from ``(LEN - 1) // 8``) into a
single ``((LEN - 1) / 8)`` integer division for C++ codegen. Without it
the literal ``floor(LEN/8 - 1/8)`` reaches C++ where ``1/8`` collapses to
``0`` and the floor argument loses the ``-1`` -- TSVC s2244-shape kernels
overshoot the loop bound and clobber the pre-loop scalar write.

Verified: removing the ``_print_floor`` override on DaceSympyPrinter
makes the test fail with ``floor(LEN/8 - 1/8)`` in the output; restoring
it emits ``((LEN - 1) / (8))``.
… a MapEntry

When the eliminated copy tasklet's source is a MapEntry, the surviving edge
leaves the map's ``OUT_<read>`` connector, so its memlet must describe the read
data and its (possibly offset) subset. The pass reused the write memlet for all
cases, leaving ``data`` pointing at the written transient with the read offset
(e.g. ``a[i + 1]``) stranded in ``other_subset`` -- an orientation inconsistent
with the connector. It still validates and runs, but a later re-lowering that
reads ``.subset`` (e.g. MapToForLoop) then drops the offset (``[0]``). This
surfaced as a canonicalize idempotency failure (second pass folded ``a[i + k]``
to ``a[0]``).

Keep the read-side memlet (data + subset) on the surviving edge for the
MapEntry-source case, carrying the write subset in ``other_subset``.

Adds tests/transformations/trivial_tasklet_elimination_test.py reproducer
(asserts the surviving edge describes the read data and keeps the offset) and
un-xfails canonicalize_symbol_lifting's cloudsc_style_range_plus_one (now
idempotent and value-preserving).
… isinstance(ControlFlowBlock)

Two more occurrences of the same NestedSDFG-inner-SDFG-clobber pattern
9093e22 fixed in ``ConditionFusion.fuse_consecutive_conditions``.
Both have identical structure -- a post-rewrite recursive walk that
sets ``.sdfg = parent.sdfg`` on every node with a ``.sdfg`` attribute
-- and both unintentionally include NestedSDFG nodes whose ``.sdfg``
attribute is the *inner* SDFG (an ``SDFGReferenceProperty`` with a
setter); the assignment overwrites the inner SDFG with the outer,
creating a graph cycle that infinite-recurses ``all_nodes_recursive``
(TSVC s275 RecursionError).

Surfaced by audit per user instruction "we should not need hasattr".

* ``condition_fusion.py:357`` -- ``fuse_nested_conditions`` (the other
  branch of ``ConditionFusion``; the consecutive branch was fixed in
  9093e22 but this nested branch was missed).
* ``early_exit_to_find_index.py:743`` -- ``_propagate_sdfg``, a
  recursive parent-fixup walk after branch deep-copy. Same clobber
  shape: ``hasattr(n, 'sdfg')`` matches NestedSDFG, writes outer
  ``sdfg`` into its inner-SDFG slot.

Both swap ``hasattr(node, "sdfg")`` for ``isinstance(node,
ControlFlowBlock)`` -- the positive type check that matches
``SDFGState`` / ``ControlFlowRegion`` / ``ConditionalBlock`` (whose
``.sdfg`` is the *containing* SDFG) but not NestedSDFG.

Audit also verified:

* ``copy.deepcopy(loop)`` properly deep-copies iedge assignment dicts
  (different ids, isolated mutation) -- no shallow-clone issue in
  LoopFission or other deepcopy-based passes.
* My IVS extension (d49e69b) uses ``dace.symbolic`` wrappers
  (``pystr_to_symbolic``, ``simplify``, ``symstr``) rather than raw
  sympy.
* My session-attributable changes touch only ``dace/transformation/``
  paths -- no modifications to ``dace/sdfg/``, ``dace/codegen/``, or
  ``dace/frontend/`` (core IR/codegen/parser).

Verification:

* ``tests/transformations/interstate/condition_fusion_test.py``: 10/10 pass
* ``tests/canonicalize/canonicalize_early_exit_to_find_index_test.py``: all pass
* ``tests/transformations/loop_fission_test.py``: 27/27 pass
* ``tests/passes/induction_variable_substitution_test.py``: 12/12 pass
* Aggregate: 70 passed, 0 failures
…t fixup

MapFusionVertical: split InOut connector instead of producing invalid SDFG
========================================================================
When the fusion intermediate's data name matches an InOut connector of a
NestedSDFG inside the producer map's body, the standard rename
``inter_name -> __map_fusion_<inter_name>`` would rewire the
NestedSDFG's OUTPUT-side memlet to the new transient but leave the
INPUT-side memlet still referencing the original outer container --
producing a validation error::

    Inout connector X is connected to different input ({'X'}) and
    output ({'__map_fusion_X'}) arrays

This was caught by canonicalize on TSVC s221::

    for i in range(1, N):
        a[i] = a[i] + c[i] * d[i]   # NestedSDFG InOut on 'a'
        b[i] = b[i-1] + a[i] + d[i] # consumer

After fission, fis0 = parallel map writing 'a', fis1 = Scan-based
consumer reading 'a'. The fuse stage's MapFusionVertical encounters
the InOut-on-'a' shape and tries to rename.

Fix: ``_split_inout_for_intermediate`` runs before
``_handle_intermediate_set`` in ``apply``. For each NestedSDFG inside
the producer's body whose InOut connectors match an intermediate's
data name:

1. Allocate a fresh inner array ``__map_fusion_split_<name>`` (same
   shape / dtype as the original ``<name>`` inside the NestedSDFG).
2. Rename every inner read-side AccessNode of ``<name>`` (``in_degree
   == 0``) to the fresh name; rename the memlet ``data`` on its
   outgoing edges.
3. Drop the InOut input connector ``<name>`` from the outer
   NestedSDFG node and add the fresh
   ``__map_fusion_split_<name>`` input connector with the same dtype.
4. Redirect the outer input edge feeding the old ``<name>`` input
   connector to the new connector.

After the split the NestedSDFG's ``<name>`` connector is OUTPUT-ONLY
and the standard rename machinery rewires it to
``__map_fusion_<name>`` cleanly. The semantically equivalent shape is
preserved (the in-place RMW becomes an out-of-place compute into the
intermediate, which is then copied back to outer ``<name>`` via the
shared-mode write-back).

``can_be_applied`` gates the split with ``_inout_split_is_safe``: v1
only handles the clean shape where every inner AccessNode of ``<name>``
is either a pure read source (``in_degree == 0``) or a pure write sink
(``in_degree > 0``). Mixed-mode accesses (``a -> ... -> a`` chains in
one state) would require use-def analysis and are refused for now.

Test: ``test_map_fusion_inout_connector_intermediate_rename_consistency``
in ``map_fusion_vertical_test.py`` pins the contract. Reproducer
minimised from s221: a 2-map SDFG with an InOut-on-'a' NestedSDFG
producer. Test asserts the fusion APPLIES, the SDFG validates, the
NestedSDFG's InOut overlap is empty post-fix, and the numerical
outputs bit-exact-match the pre-fuse oracle.

MapFusion suite: 56 passed + 1 skipped + 2 xfailed. Zero regressions.

ConditionFusion: gate parent-fixup loop with isinstance(ControlFlowBlock)
=========================================================================
The post-fusion parent-fixup loop was::

    for node, parent in sdfg.all_nodes_recursive():
        if hasattr(node, "sdfg"):
            node.sdfg = parent.sdfg

For a ``NestedSDFG`` node, ``.sdfg`` is the *inner* SDFG (an
``SDFGReferenceProperty`` with a setter), so the assignment
overwrote the inner-SDFG slot with the outer containing SDFG --
producing a graph cycle that subsequently sent
``all_nodes_recursive`` into infinite recursion (TSVC s275:
``for i: if guard: for j: a[j,i] = a[j-1,i] + b[j,i] * c[j,i]``).

Fix: replace ``hasattr(node, "sdfg")`` with the positive
``isinstance(node, ControlFlowBlock)`` check -- the loop's actual
intent is to repair ``.sdfg`` on the OUTER container blocks
(``SDFGState`` / ``ControlFlowRegion`` / ``ConditionalBlock``) whose
``.sdfg`` attribute names the containing SDFG. NestedSDFG nodes are
handled by the preceding ``set_nested_sdfg_parent_references(sdfg)``
call.

ConditionFusion suite: 10/10 passed.

Corpus impact (TSVC, 151 kernels)
=================================
All 4 previously-failing kernels now canonicalize cleanly:

    s221, s2233, s233, s275: canonicalize OK

Aggregate (canonicalize column):

    before this commit:  loops=59  maps=118  reduces=7  scans=9
    after this commit:   loops=59  maps=132  reduces=7  scans=15

+14 maps and +6 scans from the 4 newly-canonicalizing kernels.
…or RedundantArrayCopyingIn

Adds a partial-copy and extra-consumer refusal in RedundantArrayCopyingIn:
only fold when the in-copy is a full identity (subset matches the source
array) and the destination has a single outgoing edge. Without these
guards, the pass incorrectly removes the source AccessNode when the
destination is read multiple times or covers only part of the source --
producing wrong codegen that races between the kept consumer and a
reader of the dropped node.

Extracted from yakup/dev `470294025`; the original commit also added
the unrelated single-shot ``parallelize`` pipeline, which is omitted.
The array frontends lower an in-place accumulator update `arr[S] += x` into
a copy-wrapped chain -- `arr[S]` is materialized into a scalar transient,
combined in a tasklet, and copied back -- which the redundant-array passes
cannot fold (arr is both read and written in the same state). So
AugAssignToWCR never recognised the reduction and the accumulator loops
stayed sequential, blocking LoopToMap.

Add a third match expression (`arr[S] -> copy_in -> tasklet -> copy_out ->
arr[S]`) that rewrites the chain to a WCR write: the tasklet emits only the
increment and writes it straight into `arr[S]` with the reduction WCR; the
load and scalar copy-out transient are dropped. Accepts the
order-independent combines (+, *, min, max, and left-operand subtraction).

On cloudsc, once TrivialTaskletElimination exposes the spine, this lifts
122 accumulators to WCR writes and lets LoopToMap parallelize the species
reductions (loops-left 32 -> 17), bit-exact vs the untransformed kernel in
sequential IEEE.
…2382)

SDFG._used_symbols_internal expanded the stride/shape/offset symbols of EVERY
declared array into the free-symbol set. An array that is merely present but
unused (never read/written/allocated) then leaked its shape symbol (e.g.
``x_shape``) into free_symbols/arglist/signature/init_signature -- the
over-pessimism reported in issue #2382.

Restrict that expansion to arrays actually referenced inside a control-flow code
block (a ConditionalBlock guard or LoopRegion condition/range), via the new
_arrays_used_in_code_blocks (parse the code block to symbols, intersect with
array names). Memlet- and access-node-referenced arrays already contribute their
extent symbols through the contents analysis, and interstate-edge references are
expanded by the parent ControlFlowRegion -- only code-block references needed
this extra step, so a genuinely-used array keeps its symbolic extent.

Adds tests/sdfg/free_symbols_test.py regressions: the issue reproducer (unused
array does not perturb the signature) plus two over-aggression guards (a
code-block-only array keeps its stride symbol; a map-memlet-only array keeps its
extent).
…sets

Replace the bespoke _arrays_used_in_code_blocks helper with the existing
read_and_write_sets analysis, which already reports exactly the arrays that
are used (read or written, including those referenced only by a code-block
guard) -- so a merely-declared array still does not leak its shape symbol
(issue #2382), while every used array keeps its extent symbols.
…ructure

Refactor of the post-fix can_be_applied: cache descriptor lookups, drop the
dead len(edges_between) == 1 branch (already implied by the degree gates),
fold the rank+shape check into one _shapes_match helper, and clarify
the storage check (in/out must match; med may legitimately live on a
different device -- that's the CPU->GPU->CPU staging chain this pass is
designed to short-circuit). Behaviour-preserving: full redundant_copy_test
suite stays at 19 passed.
The previous form ``f'{f:.15g}'`` truncated to 15 significant digits,
which is insufficient for many fp64 values (e.g. ``1/21 == 0.047619047619047616``).
Even with the ``float(s) != f`` fallback, the formatted output and the
shortest-repr output could disagree on a save -> load -> save cycle when
sympy's parse handed back a different sympy.Float at higher precision,
breaking the SDFG serialization round-trip equality check
(``tests/library/fft_test.py::test_ifft[backward]``).

``repr`` for finite Python floats is guaranteed to produce the shortest
decimal that parses back to the same float (at most 17 significant
digits for fp64), and is idempotent under save -> load -> save. Drop
the 15g + fallback in favour of repr unconditionally; keep the cosmetic
``+= '.0'`` for integer-valued floats and the trailing-zero strip for
short non-integer decimals so ``5.0`` / ``3.14`` / ``0.1`` are unchanged.
…ening + termination cap

Three correctness fixes to SymbolPropagation, surfaced by adversarial tests:

1. Same-edge race (#5): _update_syms substituted a propagated value into an
   outgoing edge's assignment RHS without checking the edge's own assignment
   keys. Interstate-edge assignments are simultaneous, so substituting e.g.
   `anext -> a + b` into `{b: a, a: anext}` produced `{b: a, a: a + b}` -- `a`
   both read and written on one edge, which validation rejects. Now each edge
   filters out substitutions whose value reads a symbol assigned on that edge.

2. Cross-CFG assert crash (#1): _get_in_syms `assert new_in_syms == {}` crashed
   on start/branch regions that already carried edge-accumulated symbols.
   Replaced with a conservative combine (disagreements -> None).

3. Fixpoint non-termination: the inner _update_syms `while changed` loop
   oscillated forever on CYCLIC symbol value dependencies (swaps). Added an
   iteration cap (#symbols + 2) guaranteeing termination, leaving cyclic
   symbols un-substituted (conservative + correct).

Tests (tests/passes/symbol_propagation_hard_test.py): 21 original (incl. the
co-evolving-pair regression, input fixed to a valid simultaneous edge) + 14
adversarial (10 pass). 4 cyclic-swap tests are strict-xfail pinning a remaining
deeper bug: the pass over-substitutes a reassigned symbol's value into use-sites
on value cycles. Existing tests/passes/symbol_propagation_test.py stays green.
…er-dependent + conditional symbols)

Stress SymbolPropagation with the patterns that break big real-world SDFGs:
chained inter-dependent index symbols feeding array accesses; branch-divergent
index symbols feeding indirection (used inside branches and after the join);
nested conditionals; interstate-edge conditions that read a propagated symbol;
loop-carried index symbols (range + dace.map); double indirection / gather
(symbol read from an array then used as an index); sibling-scope reuse;
mutually inter-dependent loop-carried pairs. Python frontend + SDFG-API.

20 pass. 1 strict-xfail pins a GENUINE pass bug: on a co-evolving pair where
edge `mid->upd` carries `{b: a, a: anext}` and `body->mid` carries
`anext = a + b`, SymbolPropagation substitutes anext forward into the upd edge,
yielding `{b: a, a: a + b}` -- `a` is now both read and written on the same
interstate edge, which validation rejects as a race. The pass must not
substitute a symbol into an edge when that makes a variable both read and
assigned there.
…— fixes cyclic-swap over-substitution

Root cause of the cyclic over-substitution: _get_in_syms stored RAW assignment
RHS strings (tx:'y', x:'tx') without resolving them against the incoming table,
building symbol->symbol chains that form cycles (x->tx->y->ty->x) the final
replace_dict cannot resolve (swap produced no swap; m=t with t=m+2 double-counted).

Fix: resolve each edge's RHSes against the PRE-edge table (new _resolve helper),
i.e. simultaneous-assignment semantics — a swap reads the OLD values, collapsing
chains to constants/expressions up front. Crucially, a resolved value that
references ANY symbol assigned on the SAME edge stays LIVE (None): those keys are
rebound simultaneously, so the value (read with old values) must not be
propagated into a downstream use that sees the new values (else B[m] with m=m+2
becomes B[m+4]). _resolve leaves array-access values (tbl[i]) untouched so the
existing nested-array filter still drops them (parsing would mangle tbl[i] into
tbl(i) and emit invalid code).

All 4 previously-xfailed cyclic-swap tests now PASS; xfail decorators removed.
Full symbol-prop suite 42/42 (35 hard + 7 existing), no regression.
…re propagating

Investigation conclusion: SymbolPropagation is NOT buggy on CloudSC. It is a
structural no-op there (returns None, zero tasklet/memlet/interstate/symbol-
mapping diffs) and produces bit-identical output under sequential schedules;
the ~1e-5 the earlier xfail saw was CloudSC's parallel-map OpenMP reduction
floating-point nondeterminism, not symbol propagation.

The deeper reason symprop does nothing on CloudSC: its propagatable symbols
(``kfdia_plus_1_N = kfdia + 1`` -- all 124 of them) reference the horizontal-
bound scalar ARGUMENTS ``kidia`` / ``kfdia``, which are ``dt.Scalar``. The
scalar-skip filter correctly refuses to propagate values referencing a scalar
(a runtime pointer) -- and the default ``ScalarToSymbolPromotion``
(``transients_only=True``) does not promote argument scalars, so they stay
``dt.Scalar``. Promoting them first with ``transients_only=False`` turns
``kfdia`` into a symbol, after which symprop folds ``kfdia_plus_1 -> (kfdia +
1)``.

New unit test ``test_cloudsc_kidia_kfdia_promote_then_propagate`` (a CloudSC
subset: ``range(kidia, kfdia + 1)`` over several level nests) pins this:
without promotion symprop is a no-op; after ``ScalarToSymbolPromotion(
transients_only=False)`` it folds the symbols; value-preserving throughout. The
scalar-skip filter is unchanged (``test_scalars`` still guards it). Reframes
CORE_BUGFIXES L-H from "symprop bug" to "not a bug" + the promotion-order note.
…ter substitution

After ``_update_syms`` substitutes a propagated value at every use site, the
*defining* iedge assignment was left in place: a shorthand like
``k_plus_1 = klev + 1`` would still appear on the iedge even though every
downstream consumer had been rewritten to use ``klev + 1`` directly. The
cloudsc parallelize chain test caught this -- 346 ``klev+1`` / ``kfdia+1`` /
``kidia+1`` assignments survived ``symbol_propagation``.

Sweep these dead assignments at the end of ``apply_pass`` to a fixed point. An
assignment ``X = expr`` is dead if ``X`` does not appear as a free symbol in:

* any block (which transitively covers NestedSDFG ``symbol_mapping`` uses),
* any other iedge's assignment RHS or branch condition,
* any array descriptor's shape / strides / total_size / offset.

The fixed-point loop unravels chained shorthands (``a = klev + 1; b = a;
c = b`` all gone in one pass when nothing references the tail).

Tests:

* ``test_dead_iedge_assignment_eliminated_after_substitution`` -- minimal
  ``k_plus_1`` repro of the cloudsc shape; substitution + elimination both
  verified.
* ``test_dead_iedge_chain_unravels_to_fixed_point`` -- ``a = klev+1; b = a;
  c = b`` chain; all three links eliminated.
* ``test_dead_iedge_preserved_when_lhs_still_used`` -- safety counter-test:
  if the LHS is referenced (here by an array shape), the assignment stays.

All 12 tests pass (9 pre-existing + 3 new).
…iptors before dead-iedge sweep

Follow-up to 968763e. The first pass of the dead-iedge cleanup eliminated
``k_plus_1 = klev + 1`` only when no block referenced ``k_plus_1`` -- but
``SDFGState.free_symbols`` (state.py:709) pulls the array-shape symbols of an
access-node's data descriptor *into* the block's free-symbol set. So as long as
any state in the SDFG accessed an array sized ``[k_plus_1, ...]``, the cleanup
still saw ``k_plus_1`` as live and the iedge survived.

Cloudsc has 321 such bindings (after the first cleanup reduced it from 346):
every ``kfdia_plus_1_X = kfdia + 1`` is preserved because the per-state writes
go to arrays whose shape lists ``kfdia_plus_1_X``.

Restructured ``_eliminate_round`` as two phases:

1. Gather propagatable bindings -- a symbol is safe to fold if every iedge
   binding it agrees on the same RHS (no per-edge disagreement) and the RHS
   does not self-reference (a self-reference like ``i = i + 1`` marks a
   loop-carried iterator, not foldable).
2. ``sd.replace_dict(safe_subs, replace_keys=False, replace_in_graph=False)``
   substitutes the symbol into the SDFG's array descriptors (shapes /
   strides / offsets). Now ``block.free_symbols`` no longer references the
   substituted symbol.
3. Sweep dead iedges -- the standard ``lhs not in used_in_ir`` check now
   correctly identifies the bindings as dead.
4. Drop the orphaned ``sd.symbols`` declarations so nested-SDFG validation
   doesn't demand the symbol from the outer scope ("missing symbol on
   nested SDFG").

Test updates:

* ``test_dead_iedge_with_array_shape_substituted_into_descriptor`` -- the
  cloudsc-pattern minimal repro: an array sized by the shorthand symbol; the
  pass now substitutes ``klev + 1`` into the shape AND eliminates the binding.
* ``test_deeply_nested_sdfg`` assertions updated -- the previous test
  documented the OLD limitation (the pass never reached into NSDFG symbol
  mappings); the new behaviour correctly propagates ``v -> a`` through the
  NSDFG chain and sweeps the dead ``v = a`` + ``c = v+1`` bindings + their
  orphaned declarations.

All 12 SymbolPropagation tests pass. Broader 1048-test passes sweep clean
(the lone ``s113_d_single`` vectorization failure is pre-existing and
reproduces with this fix reverted).
…ymbol the current edge reassigns

A loop-carried symbol such as ``k = j + 1`` carried in from the
predecessor is STALE for the downstream block when the current edge
reassigns ``j`` (e.g. ``j = k + 1``): the carried value was computed
from the pre-edge ``j``, but the block past this edge sees the
post-edge ``j``. Propagating the carried value would read the wrong
value (e.g. ``c[k]`` becomes ``c[j + 1]`` against the reassigned ``j``,
an off-by-two). Invalidate such carried entries to live before merging
the resolved edge assignments into the table.

Un-xfails ``test_interdependent_pair_loop_api`` in the hard-symprop
suite (the fix in 86f28c1 -- simultaneous-RHS resolve -- already
closed that race; the xfail marker was stale).

Adds an idempotence regression-catcher for ``_format_float`` -- the
serializer must give the same string under ``f -> str -> f -> str`` so
SDFG save -> load -> save survives the framework's round-trip check
(``tests/library/fft_test.py::test_ifft[backward]``).

Extracted from yakup/dev ``aaed2278a`` (symbol-propagation slice only;
the canonicalize-pipeline / LICM / SplitTasklets / MoveIfIntoLoop
slices are not part of this PR).
In AugAssignToWCR.apply, the View-edge probe used e.src.desc(sdfg)
where sdfg came from the outer apply() parameter. When the
transformation is applied inside a NestedSDFG, that outer sdfg is not the
one e.src lives in -- desc() then looks the array up in the wrong
descriptor repository and either raises KeyError or returns the wrong
descriptor. Resolve via state.sdfg (the SDFG that owns the state we
are mutating) so the View probe always queries the right repository.

Extracted from yakup/dev 101d861 (wcr_conversion slice only; the
canonicalize/pipeline.py slice is not part of this PR).
A loop-invariant-guarded loop body becomes a NestedSDFG; memlet
propagation then (correctly) widens that NestedSDFG's external write
connector to the whole-array union over the loop (b[i] -> b[0:N]).
LoopToMap's write-pattern check only inspected that external memlet, so
it could no longer prove each iteration writes a distinct location and
refused ('Write pattern check failed for b - dst_subset=0:N'). The
per-iteration write b[i] is not expressible on the external connector --
it lives structurally inside the NestedSDFG -- so no amount of (correct)
propagation can recover it; the check must look inside.

Add _nested_writes_iter_indexed: when the external check fails and the
writer is a NestedSDFG, walk its inner writes to the connector's array,
rewrite their subsets through the node's symbol_mapping into the outer
iteration symbol, and apply the same a*i+b independence check (recursing
through nested NestedSDFGs, composing symbol maps). Conservative: needs
>=1 inner write and all must pass; any WCR / missing subset -> false. It
only ever grants provable parallelization; no existing check is weakened.

Effect: the LoopToMap -> MapToForLoop -> LoopToMap round-trip recovers
the map (previously the loop stayed sequential forever after one
de-parallelization). The round-trip regression test flips from strict
xfail to passing. Sweep (canonicalize + condition/map-fusion + full
loop_to_map + disjoint/overlapping-writes): 107P / 1 xfailed / 0F.
Docstring-only: a concrete before/after of _nested_writes_iter_indexed
on the guarded-loop round-trip (b[0:N] external vs the hidden inner
b[i]).
A LoopRegion whose range expression (start/end/step) references a
symbol that the loop body itself defines via an interstate-edge
assignment cannot be soundly converted to a Map. ``LoopToMap``'s
``apply`` moves the loop body into a new ``loop_body`` NestedSDFG; the
body-internal assignment goes with it, but the new Map's range stays
at the OUTER scope. Result: the Map range references a symbol defined
only inside the new NSDFG, leaving the outer scope with no binding for
it -> ``Missing symbols on nested SDFG: ['<sym>']`` at validation time
downstream.

This is the shape canonicalize produces on cloudsc (interstate-edge
assignments like ``kfdia_plus_1_N = kfdia + 1`` end up inside a loop
whose condition reads ``kfdia_plus_1_N``).

Add a ``can_be_applied`` check that disjoints
``{symbols in loop range}`` from
``{interstate-edge assignment keys inside the loop body}``. Strictly
additive refusal -- no previously-accepted loop becomes rejected
unless it has this body-assigns-loop-range-symbol shape, and in that
case conversion was producing an invalid SDFG. Refused loops stay as
LoopRegions; sequential codegen still handles them cleanly.

Reproducer test ``test_refuse_when_body_assigns_loop_range_symbol``
hand-builds the minimal shape (loop ``j < KP1`` with an interstate
edge ``KP1 = K + 1`` inside the body) and asserts the refusal. Fails
without the fix (LoopToMap applies and produces invalid SDFG); passes
with the fix.
…eration disjoint

When two writes index a point dimension by the same injective function of the
iteration variable (same a*i+b, a != 0), any collision forces the two iterations
equal, so the writes only coincide within a single iteration (ordered by program
order in the map body) and never race across iterations. _writes_may_overlap now
recognizes this, so loops with scatter writes that share the iteration variable
in one dimension (e.g. A[0, idx, i] and A[idx, 0, i]) are parallelizable. Adds
disjoint-writes unit tests (accept shared-iteration-dim; reject shared-constant-dim).
Two fixes so the textbook fixed-read loop ``a[i] = a[1] + b[i]`` parallelizes
after peeling iteration 1 (0-indexed: peel iteration 0, the carried RMW of a[0]):

- LoopToMap.test_read_memlet: a read that does not move with the iteration was
  conservatively a conflict. A loop-INVARIANT read (no iteration symbol) is only
  a conflict if it actually overlaps a write -- e.g. ``a[0]`` is disjoint from the
  loop's ``a[1:N]`` writes (the post-peel remainder), but overlaps ``a[0:N]`` (the
  un-peeled loop, a real carried dep, still refused). Defer loop-invariant reads
  to the existing propagated-overlap check instead of bailing.

- LoopInvariantCodeMotion._hoist_map_scope: variant_data only scanned AccessNodes
  between the map entry and exit, missing the arrays the map WRITES (its outputs
  flow through the map exit to AccessNodes outside that range). So the invariant
  ``a[0]`` read was wrongly hoisted into a malformed whole-array-to-scalar copy
  (``a_index = a[0:N]`` -> a pointer/value codegen error). Seed variant_data with
  the map's output arrays too, matching the documented "a written container makes
  its reads variant" criterion.

Adds a canonicalize knob test for the peeled fixed-read pattern (default stays
sequential; peel_limit>0 parallelizes and runs, value-preserving).
@ThrudPrimrose
Copy link
Copy Markdown
Collaborator Author

ThrudPrimrose commented May 31, 2026

SymbolPropagation -- what was fixed

Six correctness fixes in dace/transformation/passes/symbol_propagation.py that together make the pass safe to run inside FixedPointPipeline and correct on CloudSC / TSVC patterns.

  1. Same-edge race guard + cross-CFG assert + termination cap. A carried value referencing a symbol the current edge also reassigns is no longer propagated.

  2. Simultaneous-RHS resolve (cyclic-swap). Multiple assignments on one edge (e.g. {a: b, b: a}) are now resolved against the pre-edge table simultaneously.

  3. Return propagated set (or None) so FixedPointPipeline converges. Pre-fix apply_pass returned set() on a no-op, which the driver reads as "modified" and re-runs forever.

  4. Dead-iedge sweep after substitution. Once k_plus_1 = klev + 1 is substituted into every use, the defining assignment is dead but was left behind; now cleaned.

  5. Substitute into array descriptors before the dead-iedge sweep. A propagated symbol must reach Array.shape / strides BEFORE its iedge assignment is dropped, else the descriptor references an undefined symbol and the SDFG fails validation.

  6. Invalidate carried value whose RHS reads a symbol the current edge reassigns. k = j + 1 carried in from the predecessor is stale for the block past an edge that reassigns j (e.g. j = k + 1); pre-fix produced off-by-two indices (TSVC s128 / CloudSC).

LoopToMap -- what was fixed

Four fixes in dace/transformation/interstate/loop_to_map.py, each addressing a distinct false-refusal or false-acceptance in can_be_applied.

  1. Iteration-independence through a NestedSDFG body. A loop body that is a NestedSDFG propagates a whole-array external write memlet (b[0:N]) if the write is dynamic. This can happen if the array is written to inside an if block, which hides the per-iteration inner write. New _nested_writes_iter_indexed walks past the connector, rewrites inner write subsets through the node's symbol_mapping into the outer iteration symbol, and applies the same a*i + b check (recursive across nested NestedSDFGs, conservative). Effect: LoopToMap -> MapToForLoop -> LoopToMap recovers the map.

What the fix does

When the standard _check_range(dst_subset, a, itersym, b, step) fails and the source of the edge is a NestedSDFG, the fix calls a new helper _nested_writes_iter_indexed(nsdfg, conn, itersym, a, b, step) which looks PAST the connector instead of trusting the external memlet:

  • Walk into nsdfg.sdfg and find every write edge to an AccessNode whose .data equals the outer connector name ("b"). Those are the inner per-iteration writes the external memlet was a union over.

  • For each such inner write edge, take its dst_subset (e.g. b[i] inside the NestedSDFG, where i is the INNER symbol name).

  • Translate that subset into the OUTER iteration symbol via nsdfg.symbol_mapping. The mapping looks like {i_inner: i_outer, N: N}; substituting it into b[i_inner] yields b[i_outer].

  • Apply the same _check_range against the translated subset. If it matches a*i + b, that inner write is per-iteration unique.

  • If a deeper NestedSDFG sits inside, recurse, composing the symbol_mappings on the way down.

  1. Refuse when the loop range reads a body-assigned symbol. A loop whose start / end / step references a symbol the body reassigns would, if parallelised, see iterations execute with each other's bounds.

  2. Shared affine iteration-var dimension is cross-iteration disjoint. Two writes that index a dimension by the same injective a*i + b (a != 0) collide only when their iterations coincide -- they cannot race across iterations. Parallelises scatter writes like A[0, idx, i] = ...; A[idx, 0, i] = ... despite the opaque idx.

  3. Peeled fixed-read pattern. A loop-INVARIANT read (no iteration symbol) is no longer an automatic conflict in test_read_memlet; it's a conflict only when it actually overlaps a write. a[0] vs a[0:N] stays refused; a[0] vs a[1:N] is now correctly parallelisable (Which was refused before).

…symbolic

_serialize_symbolic_uncached dispatched the float branch through
sympy.printing.str.sstr, which uses sympy's default 15-significant-digit
mpmath formatting. The neighbouring sympy.Basic branch uses
DaceSympySerializer -> _print_Float -> _format_float -> repr(f), which
is the shortest round-trip form (at most 17 sig digits for fp64).

Any SymbolicProperty whose value reaches serialize_symbolic as a Python
float (vs as a sympy.Float) therefore produced a 15-digit string on save 1
and a 17-digit string on save 2 once the load round-trip rebuilt the value
as sympy.Float -- breaking the SDFG save -> load -> save equality check
on tests/library/fft_test.py::test_ifft[backward] (factor = 1/21).

Route the float branch through the shared _format_float so both branches
emit the same shortest-round-trip form unconditionally. Adds a
parametrised idempotence regression test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant