Skip to content

[BUG] mlx-0.31.2+mlx-metal-0.31.2 works for inference, but locally built mlx-0.32.0.dev20260524+2165dc08 produces gibberish. Metal compiler regression? #3586

@jrp2014

Description

@jrp2014

Describe the bug
Running mlx-community/MiniCPM-V-4.6-8bit with 0.32.0.dev20260524+2165dc08 produces gibbering (yes111111111). mlx-0.31.2 mlx-metal-0.31.2 produces something which looks more expected in response to a prompt of Hi: Hello! How can I help you today?

The culprit seems to be the locally built mlx.metallib, not the Python wheel and probably not the post-v0.31.2 source changes by themselves.

The checks:

  • PyPI mlx + local mlx-metal: yes111111111
  • local mlx + PyPI mlx-metal: Hello! How can I help you today?
  • PyPI backend install, but with only mlx.metallib swapped to the local build: yes111111111
  • restored PyPI mlx.metallib: Hello! How can I help you today?
    So the bad output follows the Metal shader library specifically.

My bot's best guess: Xcode/Metal toolchain drift. The PyPI 0.31.2 backend artifact was built against SDK 26.4; the local artifact was built with Xcode 26.5 / SDK 26.5 / metalfe-32023.883. That smells like either a Metal compiler regression or an MLX Metal kernel that depends on behavior the newer compiler now handles differently.

Given this is an 8-bit MiniCPM path, I’d look first at quantized matmul/GEMM kernels. A small numerical/codegen error in logits under greedy decoding would easily turn into the repeated yes111111111 pattern.

The local mlx build follows mlx’s current dev install guidance, pip install -e ".[dev]"

To Reproduce

% pip list | grep mlx
check_models              0.7.2                       /Users/jrp/Documents/AI/mlx/check_models/src
mlx                       0.32.0.dev20260524+2165dc08 /Users/jrp/Documents/AI/mlx/mlx
mlx-audio                 0.4.3
mlx-lm                    0.31.3                      /Users/jrp/Documents/AI/mlx/mlx-lm
mlx-vlm                   0.5.0                       /Users/jrp/Documents/AI/mlx/mlx-vlm
(mlx-vlm) jrp@Johns-MBP-2 src % python -m mlx_vlm generate \
  --model mlx-community/MiniCPM-V-4.6-8bit \
  --prompt "Hi" --max-tokens 10
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 14064.36it/s]
Download complete: : 0.00B [00:00, ?B/s]                                                                                                                                    | 0/9 [00:00<?, ?it/s]
==========
Files: [] 

Prompt: <|im_start|>user
Hi<|im_end|>
<|im_start|>assistant
<think>

</think>


yes111111111
==========
Prompt: 13 tokens, 509.350 tokens-per-sec
Generation: 10 tokens, 293.992 tokens-per-sec
Peak memory: 2.378 GB
(mlx-vlm) src % pip uninstall mlx
Found existing installation: mlx 0.32.0.dev20260524+2165dc08
Uninstalling mlx-0.32.0.dev20260524+2165dc08:
  Would remove:
    /Users/jrp/miniconda3/envs/mlx-vlm/bin/mlx.distributed_config
    /Users/jrp/miniconda3/envs/mlx-vlm/bin/mlx.launch
    /Users/jrp/miniconda3/envs/mlx-vlm/lib/python3.13/site-packages/__editable__.mlx-0.32.0.dev20260524+2165dc08.pth
    /Users/jrp/miniconda3/envs/mlx-vlm/lib/python3.13/site-packages/mlx-0.32.0.dev20260524+2165dc08.dist-info/*
Proceed (Y/n)? y
  Successfully uninstalled mlx-0.32.0.dev20260524+2165dc08
% pip install mlx  
Collecting mlx
  Using cached mlx-0.31.2-cp313-cp313-macosx_26_0_arm64.whl.metadata (5.9 kB)
Collecting mlx-metal==0.31.2 (from mlx)
  Using cached mlx_metal-0.31.2-py3-none-macosx_26_0_arm64.whl.metadata (5.1 kB)
Using cached mlx-0.31.2-cp313-cp313-macosx_26_0_arm64.whl (584 kB)
Using cached mlx_metal-0.31.2-py3-none-macosx_26_0_arm64.whl (55.8 MB)
Installing collected packages: mlx-metal, mlx
Successfully installed mlx-0.31.2 mlx-metal-0.31.2
(mlx-vlm) src % python -m mlx_vlm generate \
  --model mlx-community/MiniCPM-V-4.6-8bit \
  --prompt "Hi" --max-tokens 10
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 20729.67it/s]
Download complete: : 0.00B [00:00, ?B/s]                                                                                                                                    | 0/9 [00:00<?, ?it/s]
==========
Files: [] 

Prompt: <|im_start|>user
Hi<|im_end|>
<|im_start|>assistant
<think>

</think>


Hello! How can I help you today?
==========
Prompt: 13 tokens, 508.680 tokens-per-sec
Generation: 10 tokens, 290.371 tokens-per-sec
Peak memory: 2.383 GB

Expected behavior
The expected behaviour is shown in the trace with the released version of mlx, rather than the local build.

Desktop

  • OS Version: MacOS 26.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions