Skip to content

Conversation

@jirioc
Copy link
Collaborator

@jirioc jirioc commented Dec 18, 2025

Summary

Adding the loader for eIQ Neutron SDK converter module.

Test plan

This was tested by running test_neutron_converter_manager.py

cc @robert-kalmar @JakeStevens @digantdesai

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16326

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e5deb89 with merge base 0935300 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 18, 2025
@jirioc jirioc force-pushed the nxf96559/feature/EIEX-657-implement-loading-mechanism-for-eiq-neutron-sdk-converter-module branch from 6adb5b2 to 1d7b384 Compare December 18, 2025 22:36
@jirioc jirioc added the release notes: nxp Changes to the NXP Neutron backend delegate label Dec 18, 2025
@jirioc jirioc requested a review from MartinPavella December 18, 2025 22:38
Sebastian-Larsson and others added 24 commits December 19, 2025 09:31
Pass valid test_data to run_method_and_compare_outputs in PassPipeline.

Signed-off-by: Oscar Andersson <[email protected]>
Differential Revision: D86785126

Pull Request resolved: pytorch#15757
Seeing error when important flat tensor deserializer
```
(executorch) [[email protected] /data/users/lfq/executorch (aff5086)]$ python debug-lora.py 
Traceback (most recent call last):
  File "/data/users/lfq/executorch/debug-lora.py", line 1, in <module>
    from executorch.extension.flat_tensor.serialize.serialize import _deserialize_to_flat_tensor
  File "/data/users/lfq/executorch/src/executorch/extension/flat_tensor/serialize/serialize.py", line 20, in <module>
    from executorch.exir._serialize._cord import Cord
  File "/data/users/lfq/executorch/src/executorch/exir/__init__.py", line 9, in <module>
    from executorch.exir.capture import (
  File "/data/users/lfq/executorch/src/executorch/exir/capture/__init__.py", line 9, in <module>
    from executorch.exir.capture._capture import (
  File "/data/users/lfq/executorch/src/executorch/exir/capture/_capture.py", line 17, in <module>
    from executorch.exir.program import ExirExportedProgram
  File "/data/users/lfq/executorch/src/executorch/exir/program/__init__.py", line 10, in <module>
    from executorch.exir.program._program import (
  File "/data/users/lfq/executorch/src/executorch/exir/program/_program.py", line 82, in <module>
    from executorch.extension.flat_tensor.serialize.serialize import FlatTensorSerializer
ImportError: cannot import name 'FlatTensorSerializer' from partially initialized module 'executorch.extension.flat_tensor.serialize.serialize' (most likely due to a circular import) (/data/users/lfq/executorch/src/executorch/extension/flat_tensor/serialize/serialize.py)
```
Previously, the import happened at module load time, causing the
circular dependency. Now, the import happens at runtime, and we do not
hit the circular dep.
Arm backend: Support quantized while_loop
    
    - Add annotation logic.
    - Extend cond handling in q-dq folding to while
    - Extend InsertCondRescale pass to handle while.
    
----------------------------------------------------
    Arm backend: Add initial while_loop support.
    
    - Refactor CondSupported to also test while, move to own file
      and split into one check for submodule nodes, and one for ops.
    - Add node visitor
    - Add tests
-----------------------------------------------------
    Arm backend: Initial quantization support for conditional
    
    The standard prepare/convert_pt2 does not seem to support
    quantization out of the box. Instead, a quantization call
    is introduced in the TOSAQuantizer, that does the neccessary
    steps to get correct quantization on submodules. A custom
    Quantize step is needed in the ArmTester to make this work
    in testing.
    
    Additionally, getting correct quantization parameters needs
    some delicate handling. The model is calibrated twice,
    once for each code path. Because of this, the observers outside
    the if/else submodules see different data than the observers
    inside the submodules. Rescales need to be inserted to handle
    this. To get a correctly traced graph at all times, we
      1. Fold the outmost quant ops in the submodules at the same
         time as the cond is folded. Add qparam meta to folded
         nodes inside submodule.
      2. Use this meta in the InsertCondRescale pass to
         insert a tosa.RESCALE to handle the different qparams.
      3. After this, the submodule's q-dq nodes can be folded normally.


Signed-off-by: Erik Lundell <[email protected]>
* Add *Pass suffix for passes missing it
* Add missing _pass suffixes to python files containing passes
* Correct pass name: DecomposeLinearVectorNormPass to
DecomposeLinalgVectorNormPass
* Rename ConvertIntPowToMuls to DecomposeIntPowPass
* Rename QuantizeOperatorArguments to QuantizeClampArgumentsPass

Signed-off-by: Martin Lindström <[email protected]>
A lightweight Undefinedsanitizer runtime tailored for the ExecuTorch
bare metal examples. The goal is to provide basic memory safety
diagnostics while keeping the runtime self-contained.

Signed-off-by: [email protected]
Currently only support for NHWC memory format.
Adds weights-per-tensor quantization as default quantization.

Generalizes the CortexMPassManager to work with any pass with
exported_program as init arg to support using both XNNPackPasses and
ArmPasses.

Refactors the QuantizedLinearFusionPass to become a more general
CovertToCortexMPass for replacing Aten ops with corresponding Cortex-M
ops.

Signed-off-by: Adrian Lundell <[email protected]>
ATen clone ops can end up in the graph from a few sources. Since the
graph is functional, we don't actually need these and they are slow.
This PR runs the no-op clone removal pass for XNNPACK.

In addition to this, I ran into an issue where XNNPACK delegate doesn't
currently handle inputs being directly forwarded to partition outputs.
There has to be at least one operator. To solve this, I updated the
removal pass to leave these clone ops in and added copy support in the
XNN delegate to direct copy to the output.

In the long-run, I want to remove these no-ops higher up as part of
to_edge, but this requires alignment and changes with a few more
backends. See pytorch#15838. But
resolving for XNNPACK will mitigate the issue for CPU models, at least.

Differential Revision: D87405074
pytorch#15881)

As title, addressed issues exposed from
pytorch/pytorch#168098
```
FAILED exir/backend/test/test_debug_handle_map.py::TestBackendDebugHandle::test_lowered_the_whole_model - UnboundLocalError: local variable 'qnn_compile_spec_buffer' referenced before assignment
Falsifying example: test_lowered_the_whole_model(
    unlift=False,
    self=<test_debug_handle_map.TestBackendDebugHandle testMethod=test_lowered_the_whole_model>,
)

```
deserialize_pte_binary now returns an object with .program as a field,
instead of returning program directly.
Update TosaQuantizer to use get_module_name_filter from torchao. Adds
new tests to validate that set_module_name works as intended.

Fixes pytorch#15870



cc @freddan80 @per @zingo @digantdesai

Signed-off-by: Oscar Andersson <[email protected]>
Move support_extension to TosaSpecification base class to avoid having
to check whether the TosaSpecification is an instance of
TosaSpecification_1_00.

cc @freddan80 @per @zingo @digantdesai

Signed-off-by: Oscar Andersson <[email protected]>
By default outputs are re-ordered to correct order during TOSA lowering.
However this is seen as a workaround as it should not be needed.
Furthermore the output issue is not easily reproduced, rather it seems
to happen randomly. Therefore we add a test case without the workaround,
which is currently passing.
In case it won't pass without the workaround at some point, the new
changes might give some hints on why the workaround is needed and how to
fix it.
In case it continues to pass, we may switch the default and potentially
even remove the workaround.

cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai

---------

Co-authored-by: Zingo Andersen <[email protected]>
Differential Revision: D87252895

Pull Request resolved: pytorch#15885
Summary:

# Summary

This diff consolidates the backend functionality into a single target
`//executorch/backends/aoti:aoti_backend` and simplifies the cuda
backend target by making it dependent on the consolidated backend
target.

The following changes are made in this diff:

* Creation of a new target `//executorch/backends/aoti:aoti_backend` in
`fbcode/executorch/backends/aoti/targets.bzl` which includes the
necessary dependencies for the AOTI backend.
* Update of the `//executorch/backends/cuda:cuda_backend` target in
`fbcode/executorch/backends/cuda/TARGETS` to depend on the new
`//executorch/backends/aoti:aoti_backend` target instead of individual
AOTI backend dependencies.
* Creation of a new file
`fbcode/executorch/backends/aoti/aoti_backend.py` which imports the
necessary dependencies and passes for the AOTI backend.
* Simplification of the `xplat/executorch/backends/cuda/cuda_backend.py`
file by removing unnecessary imports and using the new `AotiBackend`
class from the `aoti_backend.py` file.
ghstack-source-id: 319556735

Reviewed By: larryliu0820

Differential Revision: D85704977

---------

Co-authored-by: Copilot <[email protected]>
Summary: Introduce cuda benchmark ci for monitoring cuda backend
performance.

the ci will run on three situations:

1. it will run all possible models (voxtral, gemma and whisper) combine
with all possible quantization schema on every day's 1am pst;
2. it will run an random model everytime a PR got merged;
3. manually tirggered by user.

Differential Revision: D87400561
Executor runner supports both models with/wo bundled io in same path.
To enable bundled IO EXECUTORCH_BUILD_DEVTOOLS and
EXECUTORCH_ENABLE_BUNDLE_IO are required.

Adds tests in Arm backend for testing this/depending on this.
Except for enabling bundle-io for VGF backend where applicable,
some additional resnets model tests are enabled as well.

Avoids narrowing conversion errors in pte_to_header script by switching
char to unsigned char.

Signed-off-by: Måns Nilsson <[email protected]>
Co-authored-by: Jacob Szwejbka <[email protected]>
Summary: Suspect the failure
https://github.com/pytorch/pytorch/actions/runs/19547462483/job/55989739476
is due to using different QnnBackend implementation. Rename this demo
backend to a demo backend name

Differential Revision: D87586567
### Summary
LoraLinears contain:
1. base weight (nn.Linear)
2. lora_a (nn.Linear)
3. lora_b (nn.Linear) 

(2) and (3) are caught by the filter, but (1) is not, as the weight and
bias are pulled out of the nn.Linear and placed into nn.Parameters, and
the linear is performed manually. This is for checkpoint compatibility -
otherwise we'd have to map the weights for any lora model.

See:

https://github.com/pytorch/executorch/blob/b4d72f1e271915e9c0e1d313753a1eec840fbdee/examples/models/llama/lora.py#L31-L37

This PR adds lora linears into the quantization filter.

### Test plan
```
python -m extension.llm.export.export_llm \
    base.checkpoint="${DOWNLOADED_PATH}/consolidated.00.pth" \
    base.params="${DOWNLOADED_PATH}/params.json" \
    base.adapter_checkpoint="../et_docs_7_epoch/adapter_model.safetensors" \
    base.adapter_config="../et_docs_7_epoch/adapter_config.json" \
    base.tokenizer_path="../et_docs_7_epoch/" \
    model.use_kv_cache=true \
    model.use_sdpa_with_kv_cache=true \
```

Confirm output model size is ~1.7GB instead of 5.1GB. 
```
(executorch) [[email protected] /data/users/lfq/executorch (lfq.quantize-lora-linears)]$ ls -la *.pte
-rw-r--r-- 1 lfq users 5106135168 Nov 20 15:59 et_lora.pte
-rw-r--r-- 1 lfq users 1733835776 Nov 20 17:07 et_lora_fix.pte
```
YufengShi-dudu and others added 11 commits December 19, 2025 09:31
Decompose aten.clamp.Tensor to a chain of minimum/maximum operations,
since TOSA clamp only supports scalar min/max bounds.

This change:
- Rename DecomposeInt32ClampPass to DecomposeTOSAUnsupportedClampPass
- Extend the pass to handle aten.clamp.Tensor variants already annotated
by ArmQuantizer
- Mark aten.clamp.Tensor as supported in the TOSA operator support lists
- Fix clamp decomposition to apply max with the lower bound, then min
with the upper bound (y_i = min(max(x_i, min_i), max_i)), matching
torch.clamp semantics even when min > max
- Align the memory layout of lifted tensor constants in
process_inputs_to_lifted_tensor_constants with the TOSA memory format

Signed-off-by: Yufeng Shi <[email protected]>
Co-authored-by: Ryan O'Shea <[email protected]>
Co-authored-by: Oscar Andersson <[email protected]>
- Support additional_args for while by normalizing them to carried_args
in a pass.
- Support not using all while outputs.


Signed-off-by: Erik Lundell <[email protected]>
Add method to verify dtypes of TOSA-operators in ArmTester.

Signed-off-by: Oscar Andersson <[email protected]>
Adds new checks to quantization annotation to not fuse conv+relu
patterns when output_qspec is symmetric. Symmetric quantization would
force the zero-point to be 0, and in order for us to fuse relu the
zero-point must equal qmin. Also adds new tests to verify that it works.

Signed-off-by: Oscar Andersson <[email protected]>
Differential Revision: D89308070

Pull Request resolved: pytorch#16283
…er hit validation

Differential Revision: D89322661

Pull Request resolved: pytorch#16299
Need docker builds to pass before we advance viable/strict
Differential Revision: D89093678

Pull Request resolved: pytorch#16301
@jirioc jirioc force-pushed the nxf96559/feature/EIEX-657-implement-loading-mechanism-for-eiq-neutron-sdk-converter-module branch from 1d7b384 to e5deb89 Compare December 19, 2025 08:38
@jirioc jirioc closed this Dec 19, 2025
@jirioc jirioc deleted the nxf96559/feature/EIEX-657-implement-loading-mechanism-for-eiq-neutron-sdk-converter-module branch December 19, 2025 08:39
@robert-kalmar robert-kalmar added the module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ label Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.