NXP Backend: Adding the loader for eIQ Neutron SDK converter module. #16326

jirioc · 2025-12-18T22:35:35Z

Summary

Adding the loader for eIQ Neutron SDK converter module.

Test plan

This was tested by running test_neutron_converter_manager.py

cc @robert-kalmar @JakeStevens @digantdesai

pytorch-bot · 2025-12-18T22:35:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16326

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e5deb89 with merge base 0935300 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@freddan80

cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai Signed-off-by: Sebastian Larsson <[email protected]>

Pass valid test_data to run_method_and_compare_outputs in PassPipeline. Signed-off-by: Oscar Andersson <[email protected]>

Differential Revision: D86785126 Pull Request resolved: pytorch#15757

Seeing error when important flat tensor deserializer ``` (executorch) [[email protected] /data/users/lfq/executorch (aff5086)]$ python debug-lora.py Traceback (most recent call last): File "/data/users/lfq/executorch/debug-lora.py", line 1, in <module> from executorch.extension.flat_tensor.serialize.serialize import _deserialize_to_flat_tensor File "/data/users/lfq/executorch/src/executorch/extension/flat_tensor/serialize/serialize.py", line 20, in <module> from executorch.exir._serialize._cord import Cord File "/data/users/lfq/executorch/src/executorch/exir/__init__.py", line 9, in <module> from executorch.exir.capture import ( File "/data/users/lfq/executorch/src/executorch/exir/capture/__init__.py", line 9, in <module> from executorch.exir.capture._capture import ( File "/data/users/lfq/executorch/src/executorch/exir/capture/_capture.py", line 17, in <module> from executorch.exir.program import ExirExportedProgram File "/data/users/lfq/executorch/src/executorch/exir/program/__init__.py", line 10, in <module> from executorch.exir.program._program import ( File "/data/users/lfq/executorch/src/executorch/exir/program/_program.py", line 82, in <module> from executorch.extension.flat_tensor.serialize.serialize import FlatTensorSerializer ImportError: cannot import name 'FlatTensorSerializer' from partially initialized module 'executorch.extension.flat_tensor.serialize.serialize' (most likely due to a circular import) (/data/users/lfq/executorch/src/executorch/extension/flat_tensor/serialize/serialize.py) ``` Previously, the import happened at module load time, causing the circular dependency. Now, the import happens at runtime, and we do not hit the circular dep.

Arm backend: Support quantized while_loop - Add annotation logic. - Extend cond handling in q-dq folding to while - Extend InsertCondRescale pass to handle while. ---------------------------------------------------- Arm backend: Add initial while_loop support. - Refactor CondSupported to also test while, move to own file and split into one check for submodule nodes, and one for ops. - Add node visitor - Add tests ----------------------------------------------------- Arm backend: Initial quantization support for conditional The standard prepare/convert_pt2 does not seem to support quantization out of the box. Instead, a quantization call is introduced in the TOSAQuantizer, that does the neccessary steps to get correct quantization on submodules. A custom Quantize step is needed in the ArmTester to make this work in testing. Additionally, getting correct quantization parameters needs some delicate handling. The model is calibrated twice, once for each code path. Because of this, the observers outside the if/else submodules see different data than the observers inside the submodules. Rescales need to be inserted to handle this. To get a correctly traced graph at all times, we 1. Fold the outmost quant ops in the submodules at the same time as the cond is folded. Add qparam meta to folded nodes inside submodule. 2. Use this meta in the InsertCondRescale pass to insert a tosa.RESCALE to handle the different qparams. 3. After this, the submodule's q-dq nodes can be folded normally. Signed-off-by: Erik Lundell <[email protected]>

* Add *Pass suffix for passes missing it * Add missing _pass suffixes to python files containing passes * Correct pass name: DecomposeLinearVectorNormPass to DecomposeLinalgVectorNormPass * Rename ConvertIntPowToMuls to DecomposeIntPowPass * Rename QuantizeOperatorArguments to QuantizeClampArgumentsPass Signed-off-by: Martin Lindström <[email protected]>

A lightweight Undefinedsanitizer runtime tailored for the ExecuTorch bare metal examples. The goal is to provide basic memory safety diagnostics while keeping the runtime self-contained. Signed-off-by: [email protected]

Currently only support for NHWC memory format. Adds weights-per-tensor quantization as default quantization. Generalizes the CortexMPassManager to work with any pass with exported_program as init arg to support using both XNNPackPasses and ArmPasses. Refactors the QuantizedLinearFusionPass to become a more general CovertToCortexMPass for replacing Aten ops with corresponding Cortex-M ops. Signed-off-by: Adrian Lundell <[email protected]>

@freddan80

quck-fix after pytorch#15896 cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai Signed-off-by: Adrian Lundell <[email protected]>

ATen clone ops can end up in the graph from a few sources. Since the graph is functional, we don't actually need these and they are slow. This PR runs the no-op clone removal pass for XNNPACK. In addition to this, I ran into an issue where XNNPACK delegate doesn't currently handle inputs being directly forwarded to partition outputs. There has to be at least one operator. To solve this, I updated the removal pass to leave these clone ops in and added copy support in the XNN delegate to direct copy to the output. In the long-run, I want to remove these no-ops higher up as part of to_edge, but this requires alignment and changes with a few more backends. See pytorch#15838. But resolving for XNNPACK will mitigate the issue for CPU models, at least. Differential Revision: D87405074

pytorch#15881) As title, addressed issues exposed from pytorch/pytorch#168098 ``` FAILED exir/backend/test/test_debug_handle_map.py::TestBackendDebugHandle::test_lowered_the_whole_model - UnboundLocalError: local variable 'qnn_compile_spec_buffer' referenced before assignment Falsifying example: test_lowered_the_whole_model( unlift=False, self=<test_debug_handle_map.TestBackendDebugHandle testMethod=test_lowered_the_whole_model>, ) ```

deserialize_pte_binary now returns an object with .program as a field, instead of returning program directly.

@freddan80

Update TosaQuantizer to use get_module_name_filter from torchao. Adds new tests to validate that set_module_name works as intended. Fixes pytorch#15870 cc @freddan80 @per @zingo @digantdesai Signed-off-by: Oscar Andersson <[email protected]>

@freddan80

…ytorch#15887) cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai Signed-off-by: Sebastian Larsson <[email protected]> Co-authored-by: Zingo Andersen <[email protected]>

@freddan80

Move support_extension to TosaSpecification base class to avoid having to check whether the TosaSpecification is an instance of TosaSpecification_1_00. cc @freddan80 @per @zingo @digantdesai Signed-off-by: Oscar Andersson <[email protected]>

@freddan80

By default outputs are re-ordered to correct order during TOSA lowering. However this is seen as a workaround as it should not be needed. Furthermore the output issue is not easily reproduced, rather it seems to happen randomly. Therefore we add a test case without the workaround, which is currently passing. In case it won't pass without the workaround at some point, the new changes might give some hints on why the workaround is needed and how to fix it. In case it continues to pass, we may switch the default and potentially even remove the workaround. cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai --------- Co-authored-by: Zingo Andersen <[email protected]>

Differential Revision: D87252895 Pull Request resolved: pytorch#15885

Summary: # Summary This diff consolidates the backend functionality into a single target `//executorch/backends/aoti:aoti_backend` and simplifies the cuda backend target by making it dependent on the consolidated backend target. The following changes are made in this diff: * Creation of a new target `//executorch/backends/aoti:aoti_backend` in `fbcode/executorch/backends/aoti/targets.bzl` which includes the necessary dependencies for the AOTI backend. * Update of the `//executorch/backends/cuda:cuda_backend` target in `fbcode/executorch/backends/cuda/TARGETS` to depend on the new `//executorch/backends/aoti:aoti_backend` target instead of individual AOTI backend dependencies. * Creation of a new file `fbcode/executorch/backends/aoti/aoti_backend.py` which imports the necessary dependencies and passes for the AOTI backend. * Simplification of the `xplat/executorch/backends/cuda/cuda_backend.py` file by removing unnecessary imports and using the new `AotiBackend` class from the `aoti_backend.py` file. ghstack-source-id: 319556735 Reviewed By: larryliu0820 Differential Revision: D85704977 --------- Co-authored-by: Copilot <[email protected]>

Summary: Introduce cuda benchmark ci for monitoring cuda backend performance. the ci will run on three situations: 1. it will run all possible models (voxtral, gemma and whisper) combine with all possible quantization schema on every day's 1am pst; 2. it will run an random model everytime a PR got merged; 3. manually tirggered by user. Differential Revision: D87400561

Executor runner supports both models with/wo bundled io in same path. To enable bundled IO EXECUTORCH_BUILD_DEVTOOLS and EXECUTORCH_ENABLE_BUNDLE_IO are required. Adds tests in Arm backend for testing this/depending on this. Except for enabling bundle-io for VGF backend where applicable, some additional resnets model tests are enabled as well. Avoids narrowing conversion errors in pte_to_header script by switching char to unsigned char. Signed-off-by: Måns Nilsson <[email protected]> Co-authored-by: Jacob Szwejbka <[email protected]>

Summary: Suspect the failure https://github.com/pytorch/pytorch/actions/runs/19547462483/job/55989739476 is due to using different QnnBackend implementation. Rename this demo backend to a demo backend name Differential Revision: D87586567

### Summary LoraLinears contain: 1. base weight (nn.Linear) 2. lora_a (nn.Linear) 3. lora_b (nn.Linear) (2) and (3) are caught by the filter, but (1) is not, as the weight and bias are pulled out of the nn.Linear and placed into nn.Parameters, and the linear is performed manually. This is for checkpoint compatibility - otherwise we'd have to map the weights for any lora model. See: https://github.com/pytorch/executorch/blob/b4d72f1e271915e9c0e1d313753a1eec840fbdee/examples/models/llama/lora.py#L31-L37 This PR adds lora linears into the quantization filter. ### Test plan ``` python -m extension.llm.export.export_llm \ base.checkpoint="${DOWNLOADED_PATH}/consolidated.00.pth" \ base.params="${DOWNLOADED_PATH}/params.json" \ base.adapter_checkpoint="../et_docs_7_epoch/adapter_model.safetensors" \ base.adapter_config="../et_docs_7_epoch/adapter_config.json" \ base.tokenizer_path="../et_docs_7_epoch/" \ model.use_kv_cache=true \ model.use_sdpa_with_kv_cache=true \ ``` Confirm output model size is ~1.7GB instead of 5.1GB. ``` (executorch) [[email protected] /data/users/lfq/executorch (lfq.quantize-lora-linears)]$ ls -la *.pte -rw-r--r-- 1 lfq users 5106135168 Nov 20 15:59 et_lora.pte -rw-r--r-- 1 lfq users 1733835776 Nov 20 17:07 et_lora_fix.pte ```

Decompose aten.clamp.Tensor to a chain of minimum/maximum operations, since TOSA clamp only supports scalar min/max bounds. This change: - Rename DecomposeInt32ClampPass to DecomposeTOSAUnsupportedClampPass - Extend the pass to handle aten.clamp.Tensor variants already annotated by ArmQuantizer - Mark aten.clamp.Tensor as supported in the TOSA operator support lists - Fix clamp decomposition to apply max with the lower bound, then min with the upper bound (y_i = min(max(x_i, min_i), max_i)), matching torch.clamp semantics even when min > max - Align the memory layout of lifted tensor constants in process_inputs_to_lifted_tensor_constants with the TOSA memory format Signed-off-by: Yufeng Shi <[email protected]> Co-authored-by: Ryan O'Shea <[email protected]> Co-authored-by: Oscar Andersson <[email protected]>

- Support additional_args for while by normalizing them to carried_args in a pass. - Support not using all while outputs. Signed-off-by: Erik Lundell <[email protected]>

Add method to verify dtypes of TOSA-operators in ArmTester. Signed-off-by: Oscar Andersson <[email protected]>

Adds new checks to quantization annotation to not fuse conv+relu patterns when output_qspec is symmetric. Symmetric quantization would force the zero-point to be 0, and in order for us to fuse relu the zero-point must equal qmin. Also adds new tests to verify that it works. Signed-off-by: Oscar Andersson <[email protected]>

Differential Revision: D89308070 Pull Request resolved: pytorch#16283

As titled

…er hit validation Differential Revision: D89322661 Pull Request resolved: pytorch#16299

Need docker builds to pass before we advance viable/strict

Differential Revision: D89093678 Pull Request resolved: pytorch#16301

jirioc requested a review from robert-kalmar as a code owner December 18, 2025 22:35

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 18, 2025

jirioc force-pushed the nxf96559/feature/EIEX-657-implement-loading-mechanism-for-eiq-neutron-sdk-converter-module branch from 6adb5b2 to 1d7b384 Compare December 18, 2025 22:36

jirioc added the release notes: nxp Changes to the NXP Neutron backend delegate label Dec 18, 2025

jirioc requested a review from MartinPavella December 18, 2025 22:38

Sebastian-Larsson and others added 24 commits December 19, 2025 09:31

Arm backend: Add docstrings to compile specs (pytorch#15886)

34ac200

cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai Signed-off-by: Sebastian Larsson <[email protected]>

Arm backend: Make PassPipeline test valid inputs (pytorch#15892)

12a7e13

Pass valid test_data to run_method_and_compare_outputs in PassPipeline. Signed-off-by: Oscar Andersson <[email protected]>

Update ReplaceSqueezeAndUnsqueezeWithViewPass to use new pass interface

43fcc00

Differential Revision: D86785126 Pull Request resolved: pytorch#15757

Security: Update glob to 10.5.0 to fix CVE-2025-64756 (pytorch#15904)

c0e664d

Arm backend: Update pass name in arm_aot_compiler (pytorch#15916)

3299a3d

quck-fix after pytorch#15896 cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai Signed-off-by: Adrian Lundell <[email protected]>

Fix extract model script (pytorch#15924)

6595657

deserialize_pte_binary now returns an object with .program as a field, instead of returning program directly.

Arm backend: Add docstrings to vgf/backend.py and ethosu/backend.py (p…

8006198

…ytorch#15887) cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai Signed-off-by: Sebastian Larsson <[email protected]> Co-authored-by: Zingo Andersen <[email protected]>

Fix Neutron backend API compatibility and multiprocessing fallback

80fea88

Differential Revision: D87252895 Pull Request resolved: pytorch#15885

Refactor download function for robustness and retries (pytorch#15933)

4c022df

Rename the qnn demo backend (pytorch#15930)

d3c3603

Summary: Suspect the failure https://github.com/pytorch/pytorch/actions/runs/19547462483/job/55989739476 is due to using different QnnBackend implementation. Rename this demo backend to a demo backend name Differential Revision: D87586567

YufengShi-dudu and others added 11 commits December 19, 2025 09:31

Arm backend: Improve while support (pytorch#16287)

523542a

- Support additional_args for while by normalizing them to carried_args in a pass. - Support not using all while outputs. Signed-off-by: Erik Lundell <[email protected]>

Arm backend: Add TOSA-dtype validation to tester (pytorch#16293)

88e72f8

Add method to verify dtypes of TOSA-operators in ArmTester. Signed-off-by: Oscar Andersson <[email protected]>

Add broadcast semantics to quantized ops.

f51fab8

Differential Revision: D89308070 Pull Request resolved: pytorch#16283

Fix a typo on build-wheels-windows.yml (pytorch#16300)

969a686

As titled

Clean up various leftover platform arg usage around the repo that nev…

08af1e3

…er hit validation Differential Revision: D89322661 Pull Request resolved: pytorch#16299

Add 'docker-builds' to viable/strict requirement (pytorch#16282)

3faab31

Need docker builds to pass before we advance viable/strict

forward fix

b60af73

Differential Revision: D89093678 Pull Request resolved: pytorch#16301

triton sdpa kernel with better perf (pytorch#16167)

4c1768d

NXP Backend: Adding the loader for eIQ Neutron SDK converter module.

e5deb89

jirioc force-pushed the nxf96559/feature/EIEX-657-implement-loading-mechanism-for-eiq-neutron-sdk-converter-module branch from 1d7b384 to e5deb89 Compare December 19, 2025 08:38

jirioc requested review from Gasoonjia, GregoryComer, JacobSzwejbka, SS-JIA, cccclai, digantdesai, jackzhxng, kimishpatel, kirklandsign, larryliu0820, lucylq, manuelcandales, mergennachin, metascroy and shoumikhin as code owners December 19, 2025 08:38

jirioc closed this Dec 19, 2025

jirioc deleted the nxf96559/feature/EIEX-657-implement-loading-mechanism-for-eiq-neutron-sdk-converter-module branch December 19, 2025 08:39

robert-kalmar added the module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ label Dec 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NXP Backend: Adding the loader for eIQ Neutron SDK converter module. #16326

NXP Backend: Adding the loader for eIQ Neutron SDK converter module. #16326

Uh oh!

jirioc commented Dec 18, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

NXP Backend: Adding the loader for eIQ Neutron SDK converter module. #16326

NXP Backend: Adding the loader for eIQ Neutron SDK converter module. #16326

Uh oh!

Conversation

jirioc commented Dec 18, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16326

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

jirioc commented Dec 18, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Dec 18, 2025 •

edited

Loading