Transformation bugfixes#2385
Conversation
…ables ``SDFGState.symbols_defined_at`` only folded in global symbols, inter-state-edge assignments and dataflow-scope (Map/Consume) symbols. It never walked up the control-flow hierarchy, so the loop variable of an enclosing ``LoopRegion`` (e.g. ``jk``) was reported as undefined for a node inside the loop body. That false "undefined" tripped ``propagate_memlets_nested_sdfg``'s widening fallback and replaced a nested-SDFG connector dim indexed by the enclosing loop variable with the whole array (e.g. ``arr[*, jk-1, *]`` -> ``arr[*, 0:klev, *]``), hiding the per-iteration uniqueness and blocking ``LoopToMap``. Walk ``self.parent_graph`` up the CFG, collect each enclosing ``LoopRegion``, and fold in its loop variable via ``LoopRegion.new_symbols(symbols)`` (outermost first, before the scope-symbol walk). The widening fallback's gate is now accurate; ``LoopToMap``'s analysis sees the parametric subset.
…t its data DataflowGraphView.unordered_arglist resolved the external array written through a MapExit/ExitNode from ``oedge.data.data``. For a source-relative outgoing Memlet (whose ``data`` names the inner transient, e.g. ``tmp_out``, not the destination array ``D``) this dropped the real array -- and its stride/shape symbols -- from the kernel argument list, so GPU codegen emitted a __global__ signature that used ``D``/``second_stride_D`` without declaring them (``identifier "D" is undefined``). Resolve the destination from the terminal AccessNode of the memlet path (symmetric to the input-side handling, which already follows the path), falling back to ``oedge.data.data`` when the path does not end at an AccessNode. Destination-relative output Memlets are unaffected (the terminal AccessNode is that same array). Fixes argument_signature_test and the nested_kernel_transient tests (verified serially; the -n4 set-differences were .dacecache build races, all pass serially).
The outgoing Memlet at a scope exit can be source-relative -- naming the inner transient rather than the external array being written -- so using its .data dropped the real destination array (and its stride symbols) from a GPU kernel's argument list, yielding 'identifier undefined' at compile. Resolve the written array from the memlet tree's root (the outermost-scope node, the actual fan-out destination) instead, matching the long-standing NOTE here.
validate_state's dimensionality-mismatch check dereferenced sdfg.arrays[src_node.data].veclen / sdfg.arrays[dst_node.data].veclen on every edge with other_subset set, assuming both endpoints were AccessNodes. Any edge whose src/dst is a NestedSDFG / MapEntry / MapExit / ConsumeEntry / ConsumeExit raised "AttributeError: 'NestedSDFG' object has no attribute 'data'" because scope nodes do not expose .data -- they route data via connectors, and any per-side packing lives on the inner descriptor. The crash blocked a 2nd/3rd canonicalize() call on the guarded imperfect-nest repro (a top-level guard above an outer map with a nested map and an extra-statement tail): the multi-pass pipeline eventually produces a deeper NSDFG whose inner NSDFG -> MapExit edge carries both src_subset and dst_subset, and validation aborts before the pipeline can reach a fixed point. Fix: read each side's veclen only when its endpoint is an AccessNode; default to 1 for scope nodes. The two View-exception branches had the same latent assumption -- guarded them with the same isinstance check. Closes the canonicalize idempotence crash: with this guard the repro reaches a stable fixed point (1x->(1,2), 2x=3x=4x=(2,2)) with no warnings, and the original "_it_X leaks into outer subsets" path is unaffected (those forms were the intentional WCR running-union encoding from PR #1176 -- the iterator IS in scope on the inner-out edge, so the SDFG is well-formed; the only thing missing was a validator that survives non-AccessNode endpoints). CORE_BUGFIXES.md: mark #3 (preserve_minima fabrication) as NOT-A-BUG (intentional WCR semantics); mark #4 (canonicalize idempotence) as RESOLVED by this fix; add #9 describing the validation guard with the sweep verification. Sweep: tests/transformations + tests/canonicalize + propagation + tests/passes -- 2074 passed, 25 failed (all pre-existing: 5 offset_loop_and_maps TODO-raise, 1 perf_loop_nesting refusal, 1 branch_elimination test-bug, plus environmental cache/import errors). One fewer failure than baseline (the canonicalize idem path now survives validation). Zero net regressions.
…de endpoint validate_state's dimensionality-mismatch check used to dereference sdfg.arrays[src_node.data].veclen unconditionally, crashing with "AttributeError: 'NestedSDFG' object has no attribute 'data'" on any edge with both src/dst subsets where one endpoint was a scope node (NestedSDFG / MapEntry / MapExit / ConsumeEntry / ConsumeExit). The new test builds a NestedSDFG-output -> AccessNode edge with a reshape memlet (which sets other_subset) and asserts validation reaches a verdict. Verified: reverting the isinstance(src_node, nd.AccessNode) guard reproduces the AttributeError; restored fix makes the test pass.
… doubles A Python/numpy float such as the ``1.0`` clamp in ``min(x, 1.0)`` was collapsed to int ``1`` in ``sympy_numeric_fix`` (only ``sympy.Float`` was spared). The int then mixed with a double inside a Min/Max and, after a serialization round-trip re-canonicalised the argument order, truncated the result -- the CloudSC save/load divergence. Preserve any finite float as a ``sympy.Float`` so its type survives. Float printing also forced every value through ``float()`` before formatting, so a near-max double (Fortran ``HUGE``, just over the C double max) overflowed to ``inf`` and rendered as ``inf.0``; and ``%.15g`` truncated values that need 16-17 digits to round-trip. Fall back to sympy's own shortest decimal when ``float()`` is non-finite, and to ``repr`` when 15 significant figures do not round-trip. Consolidate both printers onto ``_format_float``. CloudSC in-memory / save(compress)->load / save(plain)->load all bit-exact.
Three related fixes uncovered while triaging the bulk-imported TSVC blocks: 1. ``SplitMapForVectorRemainder`` (P2): switch from sympy's ``//`` operator to ``dace.symbolic.int_floor`` when computing ``main_end``. sympy normalises ``(LEN_1D - 1) // 8`` to ``floor(LEN_1D/8 - 1/8)`` (rewriting the integer-division as a Rational-fraction subtraction) which the C++ codegen prints as ``(LEN_1D / 8) - (1 / 8)`` — in C++ integer division ``1 / 8`` is 0, so the main bound collapses to ``LEN_1D / 8`` instead of ``(LEN_1D - 1) / 8`` and the main tile loop overruns the kernel's actual range. TSVC s2244-shape kernels (pre-loop scalar write to ``a[LEN_1D - 1]`` + body ``for i in range(LEN_1D - 1)``) silently overwrote the pre-loop write because of this. 2. ``DaceSympyPrinter._print_floor`` — defensive backstop in case any other code path still produces sympy ``floor(...)`` expressions: detect the ``floor(a/b - c/b)`` shape via ``arg.together().as_numer_denom()`` and emit ``((numerator) / (denominator))`` (correct C++ integer division). Falls through to ``floor(...)`` math-library call only for genuinely real-valued floors. Locks in the rule "never use ``//`` in the vector backend" — even if a caller forgets and uses ``//``, the printer recovers. 3. ``BranchElimination.can_be_applied`` ([line 804, 914]) — two debug-print lines referenced ``write.data`` but ``write`` is the dataname string from ``state.read_and_write_sets()`` (no ``.data`` attribute). The print crashed with ``AttributeError: 'str' object has no attribute 'data'`` and propagated out of the ``can_be_applied`` check, aborting the whole pipeline. TSVC s1279-shape kernels (nested-if with disjoint write set) hit this. Fixed to print ``write`` directly. Sweep change after these three fixes (just block1, the canary): - v2 (only the .data print fix): 20 failed → from 27. - v3 (+ int_floor in P2): 19 failed. - v4 (+ floor printer + clean cache): 19 failed (the 3 remaining s2244 failures are ``divides_evenly + LEN=64`` for kernels whose effective loop range is ``LEN_1D - 1 = 63`` — divides_evenly contract assumes range divisible by W and the test parametrise picks LEN=64 too aggressively for these kernels; a test-side LEN selector is the right follow-up).
DaceSympyPrinter._print_floor must recombine sympy's common-denominator fraction sum (e.g. ``floor(LEN/8 - 1/8)`` from ``(LEN - 1) // 8``) into a single ``((LEN - 1) / 8)`` integer division for C++ codegen. Without it the literal ``floor(LEN/8 - 1/8)`` reaches C++ where ``1/8`` collapses to ``0`` and the floor argument loses the ``-1`` -- TSVC s2244-shape kernels overshoot the loop bound and clobber the pre-loop scalar write. Verified: removing the ``_print_floor`` override on DaceSympyPrinter makes the test fail with ``floor(LEN/8 - 1/8)`` in the output; restoring it emits ``((LEN - 1) / (8))``.
… a MapEntry When the eliminated copy tasklet's source is a MapEntry, the surviving edge leaves the map's ``OUT_<read>`` connector, so its memlet must describe the read data and its (possibly offset) subset. The pass reused the write memlet for all cases, leaving ``data`` pointing at the written transient with the read offset (e.g. ``a[i + 1]``) stranded in ``other_subset`` -- an orientation inconsistent with the connector. It still validates and runs, but a later re-lowering that reads ``.subset`` (e.g. MapToForLoop) then drops the offset (``[0]``). This surfaced as a canonicalize idempotency failure (second pass folded ``a[i + k]`` to ``a[0]``). Keep the read-side memlet (data + subset) on the surviving edge for the MapEntry-source case, carrying the write subset in ``other_subset``. Adds tests/transformations/trivial_tasklet_elimination_test.py reproducer (asserts the surviving edge describes the read data and keeps the offset) and un-xfails canonicalize_symbol_lifting's cloudsc_style_range_plus_one (now idempotent and value-preserving).
… isinstance(ControlFlowBlock) Two more occurrences of the same NestedSDFG-inner-SDFG-clobber pattern 9093e22 fixed in ``ConditionFusion.fuse_consecutive_conditions``. Both have identical structure -- a post-rewrite recursive walk that sets ``.sdfg = parent.sdfg`` on every node with a ``.sdfg`` attribute -- and both unintentionally include NestedSDFG nodes whose ``.sdfg`` attribute is the *inner* SDFG (an ``SDFGReferenceProperty`` with a setter); the assignment overwrites the inner SDFG with the outer, creating a graph cycle that infinite-recurses ``all_nodes_recursive`` (TSVC s275 RecursionError). Surfaced by audit per user instruction "we should not need hasattr". * ``condition_fusion.py:357`` -- ``fuse_nested_conditions`` (the other branch of ``ConditionFusion``; the consecutive branch was fixed in 9093e22 but this nested branch was missed). * ``early_exit_to_find_index.py:743`` -- ``_propagate_sdfg``, a recursive parent-fixup walk after branch deep-copy. Same clobber shape: ``hasattr(n, 'sdfg')`` matches NestedSDFG, writes outer ``sdfg`` into its inner-SDFG slot. Both swap ``hasattr(node, "sdfg")`` for ``isinstance(node, ControlFlowBlock)`` -- the positive type check that matches ``SDFGState`` / ``ControlFlowRegion`` / ``ConditionalBlock`` (whose ``.sdfg`` is the *containing* SDFG) but not NestedSDFG. Audit also verified: * ``copy.deepcopy(loop)`` properly deep-copies iedge assignment dicts (different ids, isolated mutation) -- no shallow-clone issue in LoopFission or other deepcopy-based passes. * My IVS extension (d49e69b) uses ``dace.symbolic`` wrappers (``pystr_to_symbolic``, ``simplify``, ``symstr``) rather than raw sympy. * My session-attributable changes touch only ``dace/transformation/`` paths -- no modifications to ``dace/sdfg/``, ``dace/codegen/``, or ``dace/frontend/`` (core IR/codegen/parser). Verification: * ``tests/transformations/interstate/condition_fusion_test.py``: 10/10 pass * ``tests/canonicalize/canonicalize_early_exit_to_find_index_test.py``: all pass * ``tests/transformations/loop_fission_test.py``: 27/27 pass * ``tests/passes/induction_variable_substitution_test.py``: 12/12 pass * Aggregate: 70 passed, 0 failures
…t fixup
MapFusionVertical: split InOut connector instead of producing invalid SDFG
========================================================================
When the fusion intermediate's data name matches an InOut connector of a
NestedSDFG inside the producer map's body, the standard rename
``inter_name -> __map_fusion_<inter_name>`` would rewire the
NestedSDFG's OUTPUT-side memlet to the new transient but leave the
INPUT-side memlet still referencing the original outer container --
producing a validation error::
Inout connector X is connected to different input ({'X'}) and
output ({'__map_fusion_X'}) arrays
This was caught by canonicalize on TSVC s221::
for i in range(1, N):
a[i] = a[i] + c[i] * d[i] # NestedSDFG InOut on 'a'
b[i] = b[i-1] + a[i] + d[i] # consumer
After fission, fis0 = parallel map writing 'a', fis1 = Scan-based
consumer reading 'a'. The fuse stage's MapFusionVertical encounters
the InOut-on-'a' shape and tries to rename.
Fix: ``_split_inout_for_intermediate`` runs before
``_handle_intermediate_set`` in ``apply``. For each NestedSDFG inside
the producer's body whose InOut connectors match an intermediate's
data name:
1. Allocate a fresh inner array ``__map_fusion_split_<name>`` (same
shape / dtype as the original ``<name>`` inside the NestedSDFG).
2. Rename every inner read-side AccessNode of ``<name>`` (``in_degree
== 0``) to the fresh name; rename the memlet ``data`` on its
outgoing edges.
3. Drop the InOut input connector ``<name>`` from the outer
NestedSDFG node and add the fresh
``__map_fusion_split_<name>`` input connector with the same dtype.
4. Redirect the outer input edge feeding the old ``<name>`` input
connector to the new connector.
After the split the NestedSDFG's ``<name>`` connector is OUTPUT-ONLY
and the standard rename machinery rewires it to
``__map_fusion_<name>`` cleanly. The semantically equivalent shape is
preserved (the in-place RMW becomes an out-of-place compute into the
intermediate, which is then copied back to outer ``<name>`` via the
shared-mode write-back).
``can_be_applied`` gates the split with ``_inout_split_is_safe``: v1
only handles the clean shape where every inner AccessNode of ``<name>``
is either a pure read source (``in_degree == 0``) or a pure write sink
(``in_degree > 0``). Mixed-mode accesses (``a -> ... -> a`` chains in
one state) would require use-def analysis and are refused for now.
Test: ``test_map_fusion_inout_connector_intermediate_rename_consistency``
in ``map_fusion_vertical_test.py`` pins the contract. Reproducer
minimised from s221: a 2-map SDFG with an InOut-on-'a' NestedSDFG
producer. Test asserts the fusion APPLIES, the SDFG validates, the
NestedSDFG's InOut overlap is empty post-fix, and the numerical
outputs bit-exact-match the pre-fuse oracle.
MapFusion suite: 56 passed + 1 skipped + 2 xfailed. Zero regressions.
ConditionFusion: gate parent-fixup loop with isinstance(ControlFlowBlock)
=========================================================================
The post-fusion parent-fixup loop was::
for node, parent in sdfg.all_nodes_recursive():
if hasattr(node, "sdfg"):
node.sdfg = parent.sdfg
For a ``NestedSDFG`` node, ``.sdfg`` is the *inner* SDFG (an
``SDFGReferenceProperty`` with a setter), so the assignment
overwrote the inner-SDFG slot with the outer containing SDFG --
producing a graph cycle that subsequently sent
``all_nodes_recursive`` into infinite recursion (TSVC s275:
``for i: if guard: for j: a[j,i] = a[j-1,i] + b[j,i] * c[j,i]``).
Fix: replace ``hasattr(node, "sdfg")`` with the positive
``isinstance(node, ControlFlowBlock)`` check -- the loop's actual
intent is to repair ``.sdfg`` on the OUTER container blocks
(``SDFGState`` / ``ControlFlowRegion`` / ``ConditionalBlock``) whose
``.sdfg`` attribute names the containing SDFG. NestedSDFG nodes are
handled by the preceding ``set_nested_sdfg_parent_references(sdfg)``
call.
ConditionFusion suite: 10/10 passed.
Corpus impact (TSVC, 151 kernels)
=================================
All 4 previously-failing kernels now canonicalize cleanly:
s221, s2233, s233, s275: canonicalize OK
Aggregate (canonicalize column):
before this commit: loops=59 maps=118 reduces=7 scans=9
after this commit: loops=59 maps=132 reduces=7 scans=15
+14 maps and +6 scans from the 4 newly-canonicalizing kernels.
…or RedundantArrayCopyingIn Adds a partial-copy and extra-consumer refusal in RedundantArrayCopyingIn: only fold when the in-copy is a full identity (subset matches the source array) and the destination has a single outgoing edge. Without these guards, the pass incorrectly removes the source AccessNode when the destination is read multiple times or covers only part of the source -- producing wrong codegen that races between the kept consumer and a reader of the dropped node. Extracted from yakup/dev `470294025`; the original commit also added the unrelated single-shot ``parallelize`` pipeline, which is omitted.
The array frontends lower an in-place accumulator update `arr[S] += x` into a copy-wrapped chain -- `arr[S]` is materialized into a scalar transient, combined in a tasklet, and copied back -- which the redundant-array passes cannot fold (arr is both read and written in the same state). So AugAssignToWCR never recognised the reduction and the accumulator loops stayed sequential, blocking LoopToMap. Add a third match expression (`arr[S] -> copy_in -> tasklet -> copy_out -> arr[S]`) that rewrites the chain to a WCR write: the tasklet emits only the increment and writes it straight into `arr[S]` with the reduction WCR; the load and scalar copy-out transient are dropped. Accepts the order-independent combines (+, *, min, max, and left-operand subtraction). On cloudsc, once TrivialTaskletElimination exposes the spine, this lifts 122 accumulators to WCR writes and lets LoopToMap parallelize the species reductions (loops-left 32 -> 17), bit-exact vs the untransformed kernel in sequential IEEE.
…2382) SDFG._used_symbols_internal expanded the stride/shape/offset symbols of EVERY declared array into the free-symbol set. An array that is merely present but unused (never read/written/allocated) then leaked its shape symbol (e.g. ``x_shape``) into free_symbols/arglist/signature/init_signature -- the over-pessimism reported in issue #2382. Restrict that expansion to arrays actually referenced inside a control-flow code block (a ConditionalBlock guard or LoopRegion condition/range), via the new _arrays_used_in_code_blocks (parse the code block to symbols, intersect with array names). Memlet- and access-node-referenced arrays already contribute their extent symbols through the contents analysis, and interstate-edge references are expanded by the parent ControlFlowRegion -- only code-block references needed this extra step, so a genuinely-used array keeps its symbolic extent. Adds tests/sdfg/free_symbols_test.py regressions: the issue reproducer (unused array does not perturb the signature) plus two over-aggression guards (a code-block-only array keeps its stride symbol; a map-memlet-only array keeps its extent).
…sets Replace the bespoke _arrays_used_in_code_blocks helper with the existing read_and_write_sets analysis, which already reports exactly the arrays that are used (read or written, including those referenced only by a code-block guard) -- so a merely-declared array still does not leak its shape symbol (issue #2382), while every used array keeps its extent symbols.
…ructure Refactor of the post-fix can_be_applied: cache descriptor lookups, drop the dead len(edges_between) == 1 branch (already implied by the degree gates), fold the rank+shape check into one _shapes_match helper, and clarify the storage check (in/out must match; med may legitimately live on a different device -- that's the CPU->GPU->CPU staging chain this pass is designed to short-circuit). Behaviour-preserving: full redundant_copy_test suite stays at 19 passed.
…m comments and docstrings
The previous form ``f'{f:.15g}'`` truncated to 15 significant digits,
which is insufficient for many fp64 values (e.g. ``1/21 == 0.047619047619047616``).
Even with the ``float(s) != f`` fallback, the formatted output and the
shortest-repr output could disagree on a save -> load -> save cycle when
sympy's parse handed back a different sympy.Float at higher precision,
breaking the SDFG serialization round-trip equality check
(``tests/library/fft_test.py::test_ifft[backward]``).
``repr`` for finite Python floats is guaranteed to produce the shortest
decimal that parses back to the same float (at most 17 significant
digits for fp64), and is idempotent under save -> load -> save. Drop
the 15g + fallback in favour of repr unconditionally; keep the cosmetic
``+= '.0'`` for integer-valued floats and the trailing-zero strip for
short non-integer decimals so ``5.0`` / ``3.14`` / ``0.1`` are unchanged.
…ening + termination cap Three correctness fixes to SymbolPropagation, surfaced by adversarial tests: 1. Same-edge race (#5): _update_syms substituted a propagated value into an outgoing edge's assignment RHS without checking the edge's own assignment keys. Interstate-edge assignments are simultaneous, so substituting e.g. `anext -> a + b` into `{b: a, a: anext}` produced `{b: a, a: a + b}` -- `a` both read and written on one edge, which validation rejects. Now each edge filters out substitutions whose value reads a symbol assigned on that edge. 2. Cross-CFG assert crash (#1): _get_in_syms `assert new_in_syms == {}` crashed on start/branch regions that already carried edge-accumulated symbols. Replaced with a conservative combine (disagreements -> None). 3. Fixpoint non-termination: the inner _update_syms `while changed` loop oscillated forever on CYCLIC symbol value dependencies (swaps). Added an iteration cap (#symbols + 2) guaranteeing termination, leaving cyclic symbols un-substituted (conservative + correct). Tests (tests/passes/symbol_propagation_hard_test.py): 21 original (incl. the co-evolving-pair regression, input fixed to a valid simultaneous edge) + 14 adversarial (10 pass). 4 cyclic-swap tests are strict-xfail pinning a remaining deeper bug: the pass over-substitutes a reassigned symbol's value into use-sites on value cycles. Existing tests/passes/symbol_propagation_test.py stays green.
…er-dependent + conditional symbols)
Stress SymbolPropagation with the patterns that break big real-world SDFGs:
chained inter-dependent index symbols feeding array accesses; branch-divergent
index symbols feeding indirection (used inside branches and after the join);
nested conditionals; interstate-edge conditions that read a propagated symbol;
loop-carried index symbols (range + dace.map); double indirection / gather
(symbol read from an array then used as an index); sibling-scope reuse;
mutually inter-dependent loop-carried pairs. Python frontend + SDFG-API.
20 pass. 1 strict-xfail pins a GENUINE pass bug: on a co-evolving pair where
edge `mid->upd` carries `{b: a, a: anext}` and `body->mid` carries
`anext = a + b`, SymbolPropagation substitutes anext forward into the upd edge,
yielding `{b: a, a: a + b}` -- `a` is now both read and written on the same
interstate edge, which validation rejects as a race. The pass must not
substitute a symbol into an edge when that makes a variable both read and
assigned there.
…— fixes cyclic-swap over-substitution Root cause of the cyclic over-substitution: _get_in_syms stored RAW assignment RHS strings (tx:'y', x:'tx') without resolving them against the incoming table, building symbol->symbol chains that form cycles (x->tx->y->ty->x) the final replace_dict cannot resolve (swap produced no swap; m=t with t=m+2 double-counted). Fix: resolve each edge's RHSes against the PRE-edge table (new _resolve helper), i.e. simultaneous-assignment semantics — a swap reads the OLD values, collapsing chains to constants/expressions up front. Crucially, a resolved value that references ANY symbol assigned on the SAME edge stays LIVE (None): those keys are rebound simultaneously, so the value (read with old values) must not be propagated into a downstream use that sees the new values (else B[m] with m=m+2 becomes B[m+4]). _resolve leaves array-access values (tbl[i]) untouched so the existing nested-array filter still drops them (parsing would mangle tbl[i] into tbl(i) and emit invalid code). All 4 previously-xfailed cyclic-swap tests now PASS; xfail decorators removed. Full symbol-prop suite 42/42 (35 hard + 7 existing), no regression.
…PointPipeline converges
…re propagating Investigation conclusion: SymbolPropagation is NOT buggy on CloudSC. It is a structural no-op there (returns None, zero tasklet/memlet/interstate/symbol- mapping diffs) and produces bit-identical output under sequential schedules; the ~1e-5 the earlier xfail saw was CloudSC's parallel-map OpenMP reduction floating-point nondeterminism, not symbol propagation. The deeper reason symprop does nothing on CloudSC: its propagatable symbols (``kfdia_plus_1_N = kfdia + 1`` -- all 124 of them) reference the horizontal- bound scalar ARGUMENTS ``kidia`` / ``kfdia``, which are ``dt.Scalar``. The scalar-skip filter correctly refuses to propagate values referencing a scalar (a runtime pointer) -- and the default ``ScalarToSymbolPromotion`` (``transients_only=True``) does not promote argument scalars, so they stay ``dt.Scalar``. Promoting them first with ``transients_only=False`` turns ``kfdia`` into a symbol, after which symprop folds ``kfdia_plus_1 -> (kfdia + 1)``. New unit test ``test_cloudsc_kidia_kfdia_promote_then_propagate`` (a CloudSC subset: ``range(kidia, kfdia + 1)`` over several level nests) pins this: without promotion symprop is a no-op; after ``ScalarToSymbolPromotion( transients_only=False)`` it folds the symbols; value-preserving throughout. The scalar-skip filter is unchanged (``test_scalars`` still guards it). Reframes CORE_BUGFIXES L-H from "symprop bug" to "not a bug" + the promotion-order note.
…ter substitution After ``_update_syms`` substitutes a propagated value at every use site, the *defining* iedge assignment was left in place: a shorthand like ``k_plus_1 = klev + 1`` would still appear on the iedge even though every downstream consumer had been rewritten to use ``klev + 1`` directly. The cloudsc parallelize chain test caught this -- 346 ``klev+1`` / ``kfdia+1`` / ``kidia+1`` assignments survived ``symbol_propagation``. Sweep these dead assignments at the end of ``apply_pass`` to a fixed point. An assignment ``X = expr`` is dead if ``X`` does not appear as a free symbol in: * any block (which transitively covers NestedSDFG ``symbol_mapping`` uses), * any other iedge's assignment RHS or branch condition, * any array descriptor's shape / strides / total_size / offset. The fixed-point loop unravels chained shorthands (``a = klev + 1; b = a; c = b`` all gone in one pass when nothing references the tail). Tests: * ``test_dead_iedge_assignment_eliminated_after_substitution`` -- minimal ``k_plus_1`` repro of the cloudsc shape; substitution + elimination both verified. * ``test_dead_iedge_chain_unravels_to_fixed_point`` -- ``a = klev+1; b = a; c = b`` chain; all three links eliminated. * ``test_dead_iedge_preserved_when_lhs_still_used`` -- safety counter-test: if the LHS is referenced (here by an array shape), the assignment stays. All 12 tests pass (9 pre-existing + 3 new).
…iptors before dead-iedge sweep Follow-up to 968763e. The first pass of the dead-iedge cleanup eliminated ``k_plus_1 = klev + 1`` only when no block referenced ``k_plus_1`` -- but ``SDFGState.free_symbols`` (state.py:709) pulls the array-shape symbols of an access-node's data descriptor *into* the block's free-symbol set. So as long as any state in the SDFG accessed an array sized ``[k_plus_1, ...]``, the cleanup still saw ``k_plus_1`` as live and the iedge survived. Cloudsc has 321 such bindings (after the first cleanup reduced it from 346): every ``kfdia_plus_1_X = kfdia + 1`` is preserved because the per-state writes go to arrays whose shape lists ``kfdia_plus_1_X``. Restructured ``_eliminate_round`` as two phases: 1. Gather propagatable bindings -- a symbol is safe to fold if every iedge binding it agrees on the same RHS (no per-edge disagreement) and the RHS does not self-reference (a self-reference like ``i = i + 1`` marks a loop-carried iterator, not foldable). 2. ``sd.replace_dict(safe_subs, replace_keys=False, replace_in_graph=False)`` substitutes the symbol into the SDFG's array descriptors (shapes / strides / offsets). Now ``block.free_symbols`` no longer references the substituted symbol. 3. Sweep dead iedges -- the standard ``lhs not in used_in_ir`` check now correctly identifies the bindings as dead. 4. Drop the orphaned ``sd.symbols`` declarations so nested-SDFG validation doesn't demand the symbol from the outer scope ("missing symbol on nested SDFG"). Test updates: * ``test_dead_iedge_with_array_shape_substituted_into_descriptor`` -- the cloudsc-pattern minimal repro: an array sized by the shorthand symbol; the pass now substitutes ``klev + 1`` into the shape AND eliminates the binding. * ``test_deeply_nested_sdfg`` assertions updated -- the previous test documented the OLD limitation (the pass never reached into NSDFG symbol mappings); the new behaviour correctly propagates ``v -> a`` through the NSDFG chain and sweeps the dead ``v = a`` + ``c = v+1`` bindings + their orphaned declarations. All 12 SymbolPropagation tests pass. Broader 1048-test passes sweep clean (the lone ``s113_d_single`` vectorization failure is pre-existing and reproduces with this fix reverted).
…ymbol the current edge reassigns A loop-carried symbol such as ``k = j + 1`` carried in from the predecessor is STALE for the downstream block when the current edge reassigns ``j`` (e.g. ``j = k + 1``): the carried value was computed from the pre-edge ``j``, but the block past this edge sees the post-edge ``j``. Propagating the carried value would read the wrong value (e.g. ``c[k]`` becomes ``c[j + 1]`` against the reassigned ``j``, an off-by-two). Invalidate such carried entries to live before merging the resolved edge assignments into the table. Un-xfails ``test_interdependent_pair_loop_api`` in the hard-symprop suite (the fix in 86f28c1 -- simultaneous-RHS resolve -- already closed that race; the xfail marker was stale). Adds an idempotence regression-catcher for ``_format_float`` -- the serializer must give the same string under ``f -> str -> f -> str`` so SDFG save -> load -> save survives the framework's round-trip check (``tests/library/fft_test.py::test_ifft[backward]``). Extracted from yakup/dev ``aaed2278a`` (symbol-propagation slice only; the canonicalize-pipeline / LICM / SplitTasklets / MoveIfIntoLoop slices are not part of this PR).
In AugAssignToWCR.apply, the View-edge probe used e.src.desc(sdfg) where sdfg came from the outer apply() parameter. When the transformation is applied inside a NestedSDFG, that outer sdfg is not the one e.src lives in -- desc() then looks the array up in the wrong descriptor repository and either raises KeyError or returns the wrong descriptor. Resolve via state.sdfg (the SDFG that owns the state we are mutating) so the View probe always queries the right repository. Extracted from yakup/dev 101d861 (wcr_conversion slice only; the canonicalize/pipeline.py slice is not part of this PR).
A loop-invariant-guarded loop body becomes a NestedSDFG; memlet
propagation then (correctly) widens that NestedSDFG's external write
connector to the whole-array union over the loop (b[i] -> b[0:N]).
LoopToMap's write-pattern check only inspected that external memlet, so
it could no longer prove each iteration writes a distinct location and
refused ('Write pattern check failed for b - dst_subset=0:N'). The
per-iteration write b[i] is not expressible on the external connector --
it lives structurally inside the NestedSDFG -- so no amount of (correct)
propagation can recover it; the check must look inside.
Add _nested_writes_iter_indexed: when the external check fails and the
writer is a NestedSDFG, walk its inner writes to the connector's array,
rewrite their subsets through the node's symbol_mapping into the outer
iteration symbol, and apply the same a*i+b independence check (recursing
through nested NestedSDFGs, composing symbol maps). Conservative: needs
>=1 inner write and all must pass; any WCR / missing subset -> false. It
only ever grants provable parallelization; no existing check is weakened.
Effect: the LoopToMap -> MapToForLoop -> LoopToMap round-trip recovers
the map (previously the loop stayed sequential forever after one
de-parallelization). The round-trip regression test flips from strict
xfail to passing. Sweep (canonicalize + condition/map-fusion + full
loop_to_map + disjoint/overlapping-writes): 107P / 1 xfailed / 0F.
Docstring-only: a concrete before/after of _nested_writes_iter_indexed on the guarded-loop round-trip (b[0:N] external vs the hidden inner b[i]).
A LoopRegion whose range expression (start/end/step) references a
symbol that the loop body itself defines via an interstate-edge
assignment cannot be soundly converted to a Map. ``LoopToMap``'s
``apply`` moves the loop body into a new ``loop_body`` NestedSDFG; the
body-internal assignment goes with it, but the new Map's range stays
at the OUTER scope. Result: the Map range references a symbol defined
only inside the new NSDFG, leaving the outer scope with no binding for
it -> ``Missing symbols on nested SDFG: ['<sym>']`` at validation time
downstream.
This is the shape canonicalize produces on cloudsc (interstate-edge
assignments like ``kfdia_plus_1_N = kfdia + 1`` end up inside a loop
whose condition reads ``kfdia_plus_1_N``).
Add a ``can_be_applied`` check that disjoints
``{symbols in loop range}`` from
``{interstate-edge assignment keys inside the loop body}``. Strictly
additive refusal -- no previously-accepted loop becomes rejected
unless it has this body-assigns-loop-range-symbol shape, and in that
case conversion was producing an invalid SDFG. Refused loops stay as
LoopRegions; sequential codegen still handles them cleanly.
Reproducer test ``test_refuse_when_body_assigns_loop_range_symbol``
hand-builds the minimal shape (loop ``j < KP1`` with an interstate
edge ``KP1 = K + 1`` inside the body) and asserts the refusal. Fails
without the fix (LoopToMap applies and produces invalid SDFG); passes
with the fix.
…eration disjoint When two writes index a point dimension by the same injective function of the iteration variable (same a*i+b, a != 0), any collision forces the two iterations equal, so the writes only coincide within a single iteration (ordered by program order in the map body) and never race across iterations. _writes_may_overlap now recognizes this, so loops with scatter writes that share the iteration variable in one dimension (e.g. A[0, idx, i] and A[idx, 0, i]) are parallelizable. Adds disjoint-writes unit tests (accept shared-iteration-dim; reject shared-constant-dim).
Two fixes so the textbook fixed-read loop ``a[i] = a[1] + b[i]`` parallelizes after peeling iteration 1 (0-indexed: peel iteration 0, the carried RMW of a[0]): - LoopToMap.test_read_memlet: a read that does not move with the iteration was conservatively a conflict. A loop-INVARIANT read (no iteration symbol) is only a conflict if it actually overlaps a write -- e.g. ``a[0]`` is disjoint from the loop's ``a[1:N]`` writes (the post-peel remainder), but overlaps ``a[0:N]`` (the un-peeled loop, a real carried dep, still refused). Defer loop-invariant reads to the existing propagated-overlap check instead of bailing. - LoopInvariantCodeMotion._hoist_map_scope: variant_data only scanned AccessNodes between the map entry and exit, missing the arrays the map WRITES (its outputs flow through the map exit to AccessNodes outside that range). So the invariant ``a[0]`` read was wrongly hoisted into a malformed whole-array-to-scalar copy (``a_index = a[0:N]`` -> a pointer/value codegen error). Seed variant_data with the map's output arrays too, matching the documented "a written container makes its reads variant" criterion. Adds a canonicalize knob test for the peeled fixed-read pattern (default stays sequential; peel_limit>0 parallelizes and runs, value-preserving).
SymbolPropagation -- what was fixedSix correctness fixes in
LoopToMap -- what was fixedFour fixes in
What the fix doesWhen the standard
|
…symbolic _serialize_symbolic_uncached dispatched the float branch through sympy.printing.str.sstr, which uses sympy's default 15-significant-digit mpmath formatting. The neighbouring sympy.Basic branch uses DaceSympySerializer -> _print_Float -> _format_float -> repr(f), which is the shortest round-trip form (at most 17 sig digits for fp64). Any SymbolicProperty whose value reaches serialize_symbolic as a Python float (vs as a sympy.Float) therefore produced a 15-digit string on save 1 and a 17-digit string on save 2 once the load round-trip rebuilt the value as sympy.Float -- breaking the SDFG save -> load -> save equality check on tests/library/fft_test.py::test_ifft[backward] (factor = 1/21). Route the float branch through the shared _format_float so both branches emit the same shortest-round-trip form unconditionally. Adds a parametrised idempotence regression test.
Bugfixes
While working on the canonicalization pipeline and ensuring that we can automatically parallelize CloudSC, I have encountered many bugs and fixed many of them.
Core SDFG Files
symbols_defined_atdoes not include enclosing LoopRegion varsdace/sdfg/state.py::ControlFlowBlock.symbols_defined_at.symbols_defined_aton any node inside aLoopRegionbody.i) is missing from the returned set.symbols_defined_at; withiabsent it treats per-iteration subsets likearr[*, jk-1, *]as un-parameterised and widens them to whole-array unions, which then blocksLoopToMap.tests/sdfg/symbols_defined_at_test.py.Output-array arg dropped when outgoing memlet from MapExit is source-relative
dace/sdfg/state.py::DataflowGraphView.unordered_arglist, theAccessNode/CodeNode -> ExitNodebranch.MapExitto an outer AccessNode (D) carries a memlet whosedatafield still names an inner transient (tmp) rather thanD. (This can appear because ofoterh_subsetcopies etc.).Dis missing from the arglist. The code tookoedge.data.data == 'tmp';tmpis already indescs(the AccessNodetmpis in-scope), so nothing reachedadditional_descs.D(and its stride/shape symbols) is in the arglist.memlet_path(oedge)[-1].dst(equivalentlymemlet_tree(oedge).root().edge.dst) -- and use that node's.data, withoedge.data.dataas fallback when the path does not terminate at anAccessNode.tests/sdfg/exit_arglist_test.py::test_arglist_resolves_outer_destination_from_source_relative_outgoing_memlet. End-to-end GPU compile of the same shape:tests/codegen/argument_signature_test.py::test_argument_signature_test.Veclen lookup crashed on non-AccessNode endpoints
dace/sdfg/validation.py::validate_state, the edge dimensionality check + the two View-exception branches.other_subsetset where one endpoint is a scope node (NestedSDFG/MapEntry/MapExit/ConsumeEntry/ConsumeExit), then callsdfg.validate().AttributeError: 'NestedSDFG' object has no attribute 'data'raised fromsdfg.arrays[src_node.data].veclen. The pre-fix code assumed both endpoints wereAccessNodes; scope nodes route data through connectors and have no.datafield.veclenonly when its endpoint is anAccessNode; default to1for scope nodes. The same guard applied to the two View-exception branches downstream.tests/sdfg/validation/subset_size_test.py::test_veclen_lookup_guarded_on_non_accessnode_endpoint.Unused transient's shape symbol leaked into signature (#2382)
dace/sdfg/sdfg.py::SDFG._used_symbols_internal.x_shape). Compareused_symbols(all_symbols=False)andarglist().x_shapeadded to both its used-symbols set and its arglist -- merely declaring an unused array changes the SDFG's signature.used_symbolsandarglist.read_and_write_sets()rather than on every declared array.tests/sdfg/free_symbols_test.py::test_unused_array_does_not_leak_shape_symbol.Symbolic
sympy_numeric_fixdemoted integer-valued floats to intThe previous merge to main fixed the error on roundtrip but, still it appeared on CloudSC coming from Python frontend.
dace/symbolic.py::sympy_numeric_fix(plus a consolidation of_print_Float/_format_float).sympy_numeric_fixon a finite Python / numpy / sympy float like1.0; then construct asympy.Minof that value and a symbol.1.0is returned as Pythonint(1);Min(zanew, 1)is a type-mismatched int-Min that SymPy re-canonicalises through the integer branch and silently truncates downstream.1.0is returned assympy.Float(1.0);Min(zanew, 1.0)preserves float typing end-to-end.sympy.Float. Also fixesinf.0and 17-digit Float print bugs surfaced by the same path.tests/symbolic_print_test.py+tests/symbolic_roundtrip_test.py._print_flooremitted a literalfloor(...)instead of integer division in cpp_modeI guess users should never use // and use
int_floor, but this fix should help an unaware user from getting incorrect CPP without any warning or error.dace/symbolic.py::DaceSympyPrinter._print_floor.(LEN - 1) // 8on an integer symbolLEN(sympy normalises this tofloor(LEN/8 - 1/8)), then print withDaceSympyPrinter(cpp_mode=True).floor(LEN/8 - 1/8). C++ integer-divides1 / 8to0, so the bound effectively collapses toLEN / 8and the loop overshoots by one.((LEN - 1) / (8))-- a single integer division.arg.together().as_numer_denom()and emit a single integer-division pair.tests/symbolic_print_test.py::test_cpp_floor_of_fraction_difference_recombines_to_integer_division-- asserts the cpp output contains nofloor(and no1/8.Transformations
TrivialTaskletElimination misoriented the surviving memlet when source was a MapEntry
dace/transformation/dataflow/trivial_tasklet_elimination.py, theexpr_index=1(MapEntry-source) branch.read --in_edge--> tasklet --out_edge--> writeinto a single replacement edge fromreadtowrite. The replacement memlet must describe the side that owns the connector the edge actually leaves through.MapEntryreading an offset slice (e.g.a[i + 1]) and whose destination is a transient scalar (a_idx[0]). ApplyTrivialTaskletElimination; inspect the surviving edge's.dataand.subset.data='a_idx', subset='0', other_subset='i + 1'-- the pass reused the write-side memlet and stuffed the read offset intoother_subset. The SDFG validates and runs correctly because the dataflow is still consistent across the two subsets.data='a', subset='i + 1', other_subset='0'.expr_index=1(MapEntry as per the transformation) keepin_edge.data(the read-side memlet) as the surviving memlet, with the write subset moved intoother_subset.tests/transformations/trivial_tasklet_elimination_test.py::test_trivial_tasklet_map_source_preserves_offset_subset-- asserts the surviving edge's.data == 'a'and.subset == 'i + 1'. Pre-fix the test fails withAssertionError: surviving edge must describe the read data 'a', got 'a_idx'.MapFusionVertical InOut split + matching ConditionFusion parent-fixup
dace/transformation/dataflow/map_fusion_vertical.py(_split_inout_for_intermediate) +dace/transformation/interstate/condition_fusion.py(fuse_consecutive_conditions).NestedSDFGInOutconnector for the intermediate name merged the two sides into one connector and clobbered the upstream feedthrough; the matching parent-block bookkeeping inConditionFusion.fuse_consecutive_conditionshad the analogous clobber.InOutconnector of aNestedSDFGin the producer's body, split the connector inside the NestedSDFG (rename the inner read-side accesses to a fresh array bound to a new input connector) so the standard rename machinery rewires the output-only connector without a mismatched-InOut validation error. Apply the same split-aware bookkeeping inConditionFusion's parent fixup.tests/transformations/map_fusion_vertical_test.py.ConditionFusion: use
isinstance(ControlFlowBlock)instead ofhasattr(node, 'sdfg')dace/transformation/interstate/condition_fusion.py, the post-rewrite parent-fixup walk infuse_nested_conditions.if hasattr(node, 'sdfg'): node.sdfg = parent.sdfgoverall_nodes_recursive().hasattr(node, 'sdfg')isTrueforNestedSDFGnodes too -- butNestedSDFG.sdfgis the inner SDFG. The assignment overwrites the inner-SDFG slot with the outer container, producing a graph cycle that infinite-recursesall_nodes_recursiveon the next walk.isinstance(node, ControlFlowBlock)matchesSDFGState/ControlFlowRegion/ConditionalBlock(whose.sdfglegitimately names the containing SDFG) and skipsNestedSDFG. Defensive type tightening matching the same pattern already corrected infuse_consecutive_conditions.tests/transformations/interstate/condition_fusion_test.py(existing suite; the specific NestedSDFG-inside-condition shape is not directly exercised here -- this is a defensive same-family tightening rather than a regression-catcher fix).RedundantArrayCopyingIn folded chains it should have refused
A -> B -> C -> Dwhere one of the copies covers only part of the array (e.g.[0:2]of a size-4 array). Run the pass.A -> B -> C -> Dplus a second consumerC -> E. Run the pass.Cis removed;Eis left isolated; validation fails withInvalidSDFGNodeError: Isolated node E.applied == 0) and the SDFG stays correct.can_be_applied:out_degree(med_array) == 1(no second consumer) AND_is_full_copyon bothin -> medandmed -> out(each side's subset equalsRange.from_array(<that side's desc>)). Either failure refuses the fold.tests/transformations/redundant_copy_test.py::test_in_failure_partial_copyand::test_in_failure_extra_consumer-- both fail pre-fix with the symptoms above, pass post-fix.AugAssignToWCR did not detect the copy-wrapped read-modify-write shape
arr[S] = arr[S] + xas a 4-node chainarr[S] -> copy_in -> tasklet -> copy_out -> arr[S]-- the accumulator slice is materialised into a scalar transient before the combining tasklet and copied back after it. The existingAugAssignToWCRmatcher only recognised the direct shape and missed the copy-wrapped one, so loop-carried reductions stayed sequential and could not parallelize viaLoopToMap.expr_index=2matches the chain and rewrites it to a WCR write onarr[S]. Supports+,*,min,max, left-sub.tests/transformations/wcr_conversion_test.py.