Prerequisites
Bug Description
使用AOT官方文档编译产出example/qwen2_qnn_aot后在手机端执行。通过微小修改aot_run.cpp后decode产生乱码
设备:
SM8850芯片(V81)
一加15
模型:
qwen2.5 1.5B
请问是aot_run本身的bug还是我在复现的时候有误操作造成的?是否有推荐的排查路径?
Steps to Reproduce
aot_run修改:
#include <iostream>
#include <fmt/core.h>
#include <mllm/mllm.hpp>
#include <string>
#include "mllm/backends/qnn/aot_rt/QnnAOTRuntime.hpp"
#include "mllm/models/qwen3/configuration_qwen3.hpp"
#include "mllm/models/qwen3/tokenization_qwen3.hpp"
using mllm::Argparse;
using namespace mllm::qnn::aot; // NOLINT
MLLM_MAIN({
auto& help = Argparse::add<bool>("-h|--help").help("Show help message");
auto& model_path = Argparse::add<std::string>("-m|--model").help("Model path").def("qwen2_qnn.mllm");
auto& tokenizer_path = Argparse::add<std::string>("-t|--tokenizer").help("Tokenizer path").def("tokenizer.json");
auto& config_path = Argparse::add<std::string>("-c|--config").help("Config path").required(true);
auto& prompt_text = Argparse::add<std::string>("-p|--prompt").help("Prompt text").def("hello");
auto& ar_len = Argparse::add<int>("--ar_len").help("Autoregressive length (chunk size)").def(128);
auto& gen_len = Argparse::add<int>("--gen_len").help("Generate token length").def(32);
Argparse::parse(argc, argv);
if (help.isSet()) {
Argparse::printHelp();
return 0;
}
mllm::initQnnBackend(model_path.get());
auto qwen2_cfg = mllm::models::qwen3::Qwen3Config(config_path.get());
RunnerConfig config;
config.num_layers = qwen2_cfg.num_hidden_layers;
config.num_heads = qwen2_cfg.num_key_value_heads;
config.head_dim = qwen2_cfg.head_dim;
config.vocab_size = qwen2_cfg.vocab_size;
config.context_len = 1024;
config.ar_len = ar_len.get();
auto tokenizer = mllm::models::qwen3::Qwen3Tokenizer(tokenizer_path.get());
// Qwen2.5 chat models expect ChatML-style prompts. Avoid injecting <think> tags here:
// the tokenizer used with the current qwen2.5 assets does not have them in vocab and
// would map them to token id 0, which corrupts the prompt and leads to garbled output.
const std::string prompt =
"<|im_start|>user\n" + prompt_text.get() + "<|im_end|>\n<|im_start|>assistant\n";
auto input_tensor = mllm::models::ARGenerationOutputPast{
{"sequence", tokenizer.convert2Ids(tokenizer.tokenize(prompt))},
};
// DBG:
mllm::print(input_tensor["sequence"].shape());
mllm::print(input_tensor["sequence"]);
Runner runner(config, &tokenizer);
if (!runner.load()) {
std::cerr << "Failed to load model\n";
return 1;
}
runner.generate(
input_tensor["sequence"], gen_len.get(), [](const std::string& token) { std::cout << token << std::flush; }, true);
std::cout << "\n";
return 0;
});
Expected Behavior
正常输出
Operating System
Android
Device
一加15
MLLM Framework Version
latest
Model Information
No response
Additional Context
No response
Prerequisites
Bug Description
使用AOT官方文档编译产出example/qwen2_qnn_aot后在手机端执行。通过微小修改aot_run.cpp后decode产生乱码
设备:
SM8850芯片(V81)
一加15
模型:
qwen2.5 1.5B
请问是aot_run本身的bug还是我在复现的时候有误操作造成的?是否有推荐的排查路径?
Steps to Reproduce
aot_run修改:
Expected Behavior
正常输出
Operating System
Android
Device
一加15
MLLM Framework Version
latest
Model Information
No response
Additional Context
No response