Qualcomm AI Engine Direct - Support Multimodal VLMs #16292

DannyYuyang-quic · 2025-12-17T13:05:36Z

Summary:

add support for multimodal AOT and calibration
AOT support for VLMs:
- SmolVLM-500M
- InternVL3-1B
add quantization recipe for VLMs and vision encoders
Refactor LLM AOT and integrated it into multimodal AOT

Test plan

Test lowering of the quantized SmolVLM-500M

python -m backends.qualcomm.tests.test_qnn_delegate TestExampleMultimodalityScript.test_smolvlm_500m_instruct -b build-android --executorch_root . -a . -m SM8750

Test lowering of the quantized InternVL3-1B

python -m backends.qualcomm.tests.test_qnn_delegate TestExampleMultimodalityScript.test_internvl3_1b -b build-android --executorch_root . -a . -m SM8750

pytorch-bot · 2025-12-17T13:05:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16292

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 08db58c with merge base cf8496a ():

NEW FAILURE - The following job has failed:

pull / test-openvino-linux / linux-job (gh)
RuntimeError: Command docker exec -t d6bf3eae17324fd1cdc941276b827165f6d2a46909025865ec81f5a13c04723c /exec failed with exit code 1

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / android / run-emulator (gh) (#16137)
Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

DannyYuyang-quic · 2025-12-17T13:06:10Z

@pytorchbot label "release notes: qualcomm"

DannyYuyang-quic · 2025-12-17T13:15:55Z

Hi @cccclai ,

This PR adds multimodal AOT support.
Because the full end-to-end multimodal support PR is pretty large and also requires refactoring the existing ./examples/qualcomm/oss_scripts/llama/llama.py to make it compatible with both multimodal and LLM AOT.
To make the review easier, we’re splitting it into two PRs (AOT and Runtime). This one only covers the AOT part.

Currently, the new models SmolVLM and InternVL3 can be fully delegated to the HTP backend and reuse the static LLaMA flow.
The runtime changes have already passed internal testing, and that PR will follow soon.
Please have a look!

Thanks!!
cc: @haowhsu-quic

Summary: - add support for multimodal AOT and calibration - AOT support for VLMs: - SmolVLM-500M - InternVL3-1B - add quantization recipe for VLMs and vision encoders - Refactor LLM AOT and integrated it into the multimodal AOT flow

luffy-yu · 2025-12-18T17:45:28Z

@DannyYuyang-quic Thank you for this PR. It is just in time as I am working on a VLM project. I tested this PR on Ubuntu 24.04 with an Nvidia RTX 5090 GPU.

python=3.11.14
executorch==1.1.0a0+28da6a8
torch==2.8.0+cu128
torchaudio==2.8.0+cu128
torchvision==0.23.0+cu128

Here is the result.

Test	Status	Details
SmolVLM-500M	✅ PASSED	Completed in 218.1s
InternVL3-1B	❌ FAILED	Completed in 661.8s

InternVL3-1B Failure Details:
The test failed due to an assertion error in test_qnn_delegate.py:6631:

======================================================================
FAIL: test_internvl3_1b (__main__.TestExampleMultimodalityScript.test_internvl3_1b)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/n10288/Documents/Code/executorch/backends/qualcomm/tests/test_qnn_delegate.py", line 6631, in test_internvl3_1b
    self.assertLessEqual(encoder_pte_size, 390_000_000)  # 390MB
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 390300948 not less than or equal to 390000000

----------------------------------------------------------------------

DannyYuyang-quic · 2025-12-19T06:04:26Z

Hi @luffy-yu, thanks for running the tests! Just wondering, did you rebuild the codebase from this commit or a newer one?
On my side, the encoder size has consistently been around 387 MB.

By the way, which QNN version and SoC are you using? We’re currently on QNN 2.37.
Note that different QNN versions or SoCs can also lead to variations in PTE size

Due to this, I've increased the PTE size tolerance to around 10% for CI checks.

luffy-yu · 2025-12-19T16:56:57Z

Hi @luffy-yu, thanks for running the tests! Just wondering, did you rebuild the codebase from this commit or a newer one? On my side, the encoder size has consistently been around 387 MB.

By the way, which QNN version and SoC are you using? We’re currently on QNN 2.37. Note that different QNN versions or SoCs can also lead to variations in PTE size

Due to this, I've increased the PTE size tolerance to around 10% for CI checks.

Hi @DannyYuyang-quic, it was built on this PR commit.

I am also using QNN 2.37(2.37.0.250724), but the SoC is SM8550.

Here is the new output after applying the size changes.

test_smolvlm_500m_instruct

Encoder PTE Size: 102143252 bytes
Text Embedding PTE Size: 94655764 bytes
Decoder PTE Size: 370013204 bytes
.
----------------------------------------------------------------------
Ran 1 test in 219.404s

test_internvl3_1b

Encoder PTE Size: 390300948 bytes
Text Embedding PTE Size: 271840532 bytes
Decoder PTE Size: 504288532 bytes
.
----------------------------------------------------------------------
Ran 1 test in 582.499s

BTW, when will the Runtime PR be out? I was trying to run the inference on Android SoC, but it did not work.

meta-codesync · 2025-12-19T20:10:08Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D89567155.

DannyYuyang-quic requested review from cccclai, jackzhxng and lucylq as code owners December 17, 2025 13:05

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 17, 2025

pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Dec 17, 2025

Qualcomm AI Engine Direct - Support Multimodal VLMs

afe647e

Summary: - add support for multimodal AOT and calibration - AOT support for VLMs: - SmolVLM-500M - InternVL3-1B - add quantization recipe for VLMs and vision encoders - Refactor LLM AOT and integrated it into the multimodal AOT flow

DannyYuyang-quic force-pushed the dev1/danny/AOT_multimodal_support branch from aaa3a5a to afe647e Compare December 18, 2025 00:46

Increase PTE size tolerance to ~10% for CI checks

08db58c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Support Multimodal VLMs #16292

Qualcomm AI Engine Direct - Support Multimodal VLMs #16292

DannyYuyang-quic commented Dec 17, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

DannyYuyang-quic commented Dec 17, 2025

Uh oh!

DannyYuyang-quic commented Dec 17, 2025

Uh oh!

luffy-yu commented Dec 18, 2025 •

edited

Loading

Uh oh!

DannyYuyang-quic commented Dec 19, 2025 •

edited

Loading

Uh oh!

luffy-yu commented Dec 19, 2025

Uh oh!

meta-codesync bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Qualcomm AI Engine Direct - Support Multimodal VLMs #16292

Are you sure you want to change the base?

Qualcomm AI Engine Direct - Support Multimodal VLMs #16292

Conversation

DannyYuyang-quic commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

pytorch-bot bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16292

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

DannyYuyang-quic commented Dec 17, 2025

Uh oh!

DannyYuyang-quic commented Dec 17, 2025

Uh oh!

luffy-yu commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DannyYuyang-quic commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luffy-yu commented Dec 19, 2025

Uh oh!

meta-codesync bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DannyYuyang-quic commented Dec 17, 2025 •

edited

Loading

pytorch-bot bot commented Dec 17, 2025 •

edited

Loading

luffy-yu commented Dec 18, 2025 •

edited

Loading

DannyYuyang-quic commented Dec 19, 2025 •

edited

Loading