Skip to content

Conversation

@DannyYuyang-quic
Copy link
Contributor

@DannyYuyang-quic DannyYuyang-quic commented Dec 17, 2025

Summary:

  • add support for multimodal AOT and calibration
  • AOT support for VLMs:
    • SmolVLM-500M
    • InternVL3-1B
  • add quantization recipe for VLMs and vision encoders
  • Refactor LLM AOT and integrated it into multimodal AOT

Test plan

Test lowering of the quantized SmolVLM-500M

python -m backends.qualcomm.tests.test_qnn_delegate TestExampleMultimodalityScript.test_smolvlm_500m_instruct -b build-android --executorch_root . -a . -m SM8750

Test lowering of the quantized InternVL3-1B

python -m backends.qualcomm.tests.test_qnn_delegate TestExampleMultimodalityScript.test_internvl3_1b -b build-android --executorch_root . -a . -m SM8750

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16292

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 08db58c with merge base cf8496a (image):

NEW FAILURE - The following job has failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 17, 2025
@DannyYuyang-quic
Copy link
Contributor Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Dec 17, 2025
@DannyYuyang-quic
Copy link
Contributor Author

Hi @cccclai ,

This PR adds multimodal AOT support.
Because the full end-to-end multimodal support PR is pretty large and also requires refactoring the existing ./examples/qualcomm/oss_scripts/llama/llama.py to make it compatible with both multimodal and LLM AOT.
To make the review easier, we’re splitting it into two PRs (AOT and Runtime). This one only covers the AOT part.

Currently, the new models SmolVLM and InternVL3 can be fully delegated to the HTP backend and reuse the static LLaMA flow.
The runtime changes have already passed internal testing, and that PR will follow soon.
Please have a look!

Thanks!!
cc: @haowhsu-quic

Summary:
 - add support for multimodal AOT and calibration
 - AOT support for VLMs:
   - SmolVLM-500M
   - InternVL3-1B
 - add quantization recipe for VLMs and vision encoders
 - Refactor LLM AOT and integrated it into the multimodal AOT flow
@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/AOT_multimodal_support branch from aaa3a5a to afe647e Compare December 18, 2025 00:46
@luffy-yu
Copy link
Contributor

luffy-yu commented Dec 18, 2025

@DannyYuyang-quic Thank you for this PR. It is just in time as I am working on a VLM project. I tested this PR on Ubuntu 24.04 with an Nvidia RTX 5090 GPU.

  • python=3.11.14
  • executorch==1.1.0a0+28da6a8
  • torch==2.8.0+cu128
  • torchaudio==2.8.0+cu128
  • torchvision==0.23.0+cu128

Here is the result.

Test Status Details
SmolVLM-500M ✅ PASSED Completed in 218.1s
InternVL3-1B ❌ FAILED Completed in 661.8s

InternVL3-1B Failure Details:
The test failed due to an assertion error in test_qnn_delegate.py:6631:

======================================================================
FAIL: test_internvl3_1b (__main__.TestExampleMultimodalityScript.test_internvl3_1b)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/n10288/Documents/Code/executorch/backends/qualcomm/tests/test_qnn_delegate.py", line 6631, in test_internvl3_1b
    self.assertLessEqual(encoder_pte_size, 390_000_000)  # 390MB
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 390300948 not less than or equal to 390000000

----------------------------------------------------------------------

@DannyYuyang-quic
Copy link
Contributor Author

DannyYuyang-quic commented Dec 19, 2025

Hi @luffy-yu, thanks for running the tests! Just wondering, did you rebuild the codebase from this commit or a newer one?
On my side, the encoder size has consistently been around 387 MB.

By the way, which QNN version and SoC are you using? We’re currently on QNN 2.37.
Note that different QNN versions or SoCs can also lead to variations in PTE size

Due to this, I've increased the PTE size tolerance to around 10% for CI checks.

@luffy-yu
Copy link
Contributor

Hi @luffy-yu, thanks for running the tests! Just wondering, did you rebuild the codebase from this commit or a newer one? On my side, the encoder size has consistently been around 387 MB.

By the way, which QNN version and SoC are you using? We’re currently on QNN 2.37. Note that different QNN versions or SoCs can also lead to variations in PTE size

Due to this, I've increased the PTE size tolerance to around 10% for CI checks.

Hi @DannyYuyang-quic, it was built on this PR commit.

I am also using QNN 2.37(2.37.0.250724), but the SoC is SM8550.

Here is the new output after applying the size changes.

  • test_smolvlm_500m_instruct
Encoder PTE Size: 102143252 bytes
Text Embedding PTE Size: 94655764 bytes
Decoder PTE Size: 370013204 bytes
.
----------------------------------------------------------------------
Ran 1 test in 219.404s
  • test_internvl3_1b
Encoder PTE Size: 390300948 bytes
Text Embedding PTE Size: 271840532 bytes
Decoder PTE Size: 504288532 bytes
.
----------------------------------------------------------------------
Ran 1 test in 582.499s

BTW, when will the Runtime PR be out? I was trying to run the inference on Android SoC, but it did not work.

@meta-codesync
Copy link

meta-codesync bot commented Dec 19, 2025

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D89567155.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants