Skip to content

[Bug] v0.12.2-cu12.8部署qwen3.5的时候请,求任何内容出现 illegal memory #4436

@huangtao2999

Description

@huangtao2999

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

latest-cu128 在单卡2080TI( 22G显存 ) + 8卡成功部署了Qwen3.5-35B-A3B。但是用请求测试的时候出现illegal memory 错误

Reproduction

docker run -d
--runtime nvidia
--gpus '"device=0,1,2,3,4,5,6,7"'
--name lmdeploy_ht
-v /data/model/hugginface:/models
-v ~/.cache/huggingface:/root/.cache/huggingface
-p 23333:23333
--ipc=host
--shm-size=128g
docker.1ms.run/openmmlab/lmdeploy:v0.12.2-cu12.8
tail -f /dev/null
docker exec -it lmdeploy_ht bash

lmdeploy serve api_server /models/Qwen/Qwen3.5-35B-A3B
--tp 8
--cache-max-entry-count 0.8
--log-level INFO
--max-concurrent-requests 4
--model-name pkumlm_txt
--backend turbomind
--max_batch_size 64
--cache-block-seq-len 32

Image Image

Environment

v0.12.2-cu12.8
单卡2080TI(  22G显存 ) * 8卡
Qwen3.5-35B-A3B

Error traceback

curl  -H "Accept: application/json"  -H "Content-type: application/json"   -X POST  -d '{"model": "pkumlm_txt","messages":[{"role":"user","content":"你是谁"} ]}'   http://192.168.1.36:23333/v1/chat/completions

报错
2026-03-19 12:52:50,954 - lmdeploy - INFO - turbomind.py:687 - [async_stream_infer] session 1 start
[TM][INFO] [SeqMgr][Create] ID 1
[TM][WARNING] [ProcessInferRequests] [1] total sequence length (11 + 262133) exceeds `session_len` (258368), `max_new_tokens` is truncated to 258357
invalid argument

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions