-
Notifications
You must be signed in to change notification settings - Fork 767
Qualcomm AI Engine Direct - Support Multimodal VLMs #16292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Qualcomm AI Engine Direct - Support Multimodal VLMs #16292
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16292
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit 08db58c with merge base cf8496a ( NEW FAILURE - The following job has failed:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "release notes: qualcomm" |
|
Hi @cccclai , This PR adds multimodal AOT support. Currently, the new models Thanks!! |
Summary: - add support for multimodal AOT and calibration - AOT support for VLMs: - SmolVLM-500M - InternVL3-1B - add quantization recipe for VLMs and vision encoders - Refactor LLM AOT and integrated it into the multimodal AOT flow
aaa3a5a to
afe647e
Compare
|
@DannyYuyang-quic Thank you for this PR. It is just in time as I am working on a VLM project. I tested this PR on Ubuntu 24.04 with an Nvidia RTX 5090 GPU.
Here is the result.
InternVL3-1B Failure Details: |
|
Hi @luffy-yu, thanks for running the tests! Just wondering, did you rebuild the codebase from this commit or a newer one? By the way, which QNN version and SoC are you using? We’re currently on QNN 2.37. Due to this, I've increased the PTE size tolerance to around 10% for CI checks. |
Hi @DannyYuyang-quic, it was built on this PR commit. I am also using QNN 2.37(2.37.0.250724), but the SoC is SM8550. Here is the new output after applying the size changes.
BTW, when will the Runtime PR be out? I was trying to run the inference on Android SoC, but it did not work. |
Summary:
Test plan
Test lowering of the quantized SmolVLM-500M
Test lowering of the quantized InternVL3-1B