Describe the bug
Running mlx-community/MiniCPM-V-4.6-8bit with 0.32.0.dev20260524+2165dc08 produces gibbering (yes111111111). mlx-0.31.2 mlx-metal-0.31.2 produces something which looks more expected in response to a prompt of Hi: Hello! How can I help you today?
The culprit seems to be the locally built mlx.metallib, not the Python wheel and probably not the post-v0.31.2 source changes by themselves.
The checks:
- PyPI mlx + local mlx-metal: yes111111111
- local mlx + PyPI mlx-metal: Hello! How can I help you today?
- PyPI backend install, but with only mlx.metallib swapped to the local build: yes111111111
- restored PyPI mlx.metallib: Hello! How can I help you today?
So the bad output follows the Metal shader library specifically.
My bot's best guess: Xcode/Metal toolchain drift. The PyPI 0.31.2 backend artifact was built against SDK 26.4; the local artifact was built with Xcode 26.5 / SDK 26.5 / metalfe-32023.883. That smells like either a Metal compiler regression or an MLX Metal kernel that depends on behavior the newer compiler now handles differently.
Given this is an 8-bit MiniCPM path, I’d look first at quantized matmul/GEMM kernels. A small numerical/codegen error in logits under greedy decoding would easily turn into the repeated yes111111111 pattern.
The local mlx build follows mlx’s current dev install guidance, pip install -e ".[dev]"
To Reproduce
% pip list | grep mlx
check_models 0.7.2 /Users/jrp/Documents/AI/mlx/check_models/src
mlx 0.32.0.dev20260524+2165dc08 /Users/jrp/Documents/AI/mlx/mlx
mlx-audio 0.4.3
mlx-lm 0.31.3 /Users/jrp/Documents/AI/mlx/mlx-lm
mlx-vlm 0.5.0 /Users/jrp/Documents/AI/mlx/mlx-vlm
(mlx-vlm) jrp@Johns-MBP-2 src % python -m mlx_vlm generate \
--model mlx-community/MiniCPM-V-4.6-8bit \
--prompt "Hi" --max-tokens 10
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 14064.36it/s]
Download complete: : 0.00B [00:00, ?B/s] | 0/9 [00:00<?, ?it/s]
==========
Files: []
Prompt: <|im_start|>user
Hi<|im_end|>
<|im_start|>assistant
<think>
</think>
yes111111111
==========
Prompt: 13 tokens, 509.350 tokens-per-sec
Generation: 10 tokens, 293.992 tokens-per-sec
Peak memory: 2.378 GB
(mlx-vlm) src % pip uninstall mlx
Found existing installation: mlx 0.32.0.dev20260524+2165dc08
Uninstalling mlx-0.32.0.dev20260524+2165dc08:
Would remove:
/Users/jrp/miniconda3/envs/mlx-vlm/bin/mlx.distributed_config
/Users/jrp/miniconda3/envs/mlx-vlm/bin/mlx.launch
/Users/jrp/miniconda3/envs/mlx-vlm/lib/python3.13/site-packages/__editable__.mlx-0.32.0.dev20260524+2165dc08.pth
/Users/jrp/miniconda3/envs/mlx-vlm/lib/python3.13/site-packages/mlx-0.32.0.dev20260524+2165dc08.dist-info/*
Proceed (Y/n)? y
Successfully uninstalled mlx-0.32.0.dev20260524+2165dc08
% pip install mlx
Collecting mlx
Using cached mlx-0.31.2-cp313-cp313-macosx_26_0_arm64.whl.metadata (5.9 kB)
Collecting mlx-metal==0.31.2 (from mlx)
Using cached mlx_metal-0.31.2-py3-none-macosx_26_0_arm64.whl.metadata (5.1 kB)
Using cached mlx-0.31.2-cp313-cp313-macosx_26_0_arm64.whl (584 kB)
Using cached mlx_metal-0.31.2-py3-none-macosx_26_0_arm64.whl (55.8 MB)
Installing collected packages: mlx-metal, mlx
Successfully installed mlx-0.31.2 mlx-metal-0.31.2
(mlx-vlm) src % python -m mlx_vlm generate \
--model mlx-community/MiniCPM-V-4.6-8bit \
--prompt "Hi" --max-tokens 10
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 20729.67it/s]
Download complete: : 0.00B [00:00, ?B/s] | 0/9 [00:00<?, ?it/s]
==========
Files: []
Prompt: <|im_start|>user
Hi<|im_end|>
<|im_start|>assistant
<think>
</think>
Hello! How can I help you today?
==========
Prompt: 13 tokens, 508.680 tokens-per-sec
Generation: 10 tokens, 290.371 tokens-per-sec
Peak memory: 2.383 GB
Expected behavior
The expected behaviour is shown in the trace with the released version of mlx, rather than the local build.
Desktop
Describe the bug
Running
mlx-community/MiniCPM-V-4.6-8bitwith0.32.0.dev20260524+2165dc08produces gibbering (yes111111111).mlx-0.31.2 mlx-metal-0.31.2produces something which looks more expected in response to a prompt ofHi:Hello! How can I help you today?The culprit seems to be the locally built mlx.metallib, not the Python wheel and probably not the post-v0.31.2 source changes by themselves.
The checks:
So the bad output follows the Metal shader library specifically.
My bot's best guess: Xcode/Metal toolchain drift. The PyPI 0.31.2 backend artifact was built against SDK 26.4; the local artifact was built with Xcode 26.5 / SDK 26.5 / metalfe-32023.883. That smells like either a Metal compiler regression or an MLX Metal kernel that depends on behavior the newer compiler now handles differently.
Given this is an 8-bit MiniCPM path, I’d look first at quantized matmul/GEMM kernels. A small numerical/codegen error in logits under greedy decoding would easily turn into the repeated yes111111111 pattern.
The local mlx build follows mlx’s current dev install guidance,
pip install -e ".[dev]"To Reproduce
Expected behavior
The expected behaviour is shown in the trace with the released version of mlx, rather than the local build.
Desktop