Skip to content

How does the Qwen3.5 model disable thinking? #4444

@SongXiaoMao

Description

@SongXiaoMao

lmdeploy serve api_server /home/cheng/model/Qwen3.5-27B-AWQ
--tp 4
--cache-max-entry-count 0.8
--log-level INFO
--max-concurrent-requests 4
--model-name Qwen3.5-27B-AWQ
--backend turbomind
--max_batch_size 64
--api-key abc123
--server-port 8000
--cache-block-seq-len 32 \

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions