Skip to content

[BUG] crash on qr followed by diag #3583

@hughjonesd

Description

@hughjonesd

Describe the bug

mlx crashes on my M2 Macbook Air with the following code:

import mlx.core as mx

x = mx.random.normal((10000, 10000))
q, r = mx.linalg.qr(x, stream=mx.cpu)
mx.eval(mx.diag(r))

giving

libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout)

The following variant does not crash:

import mlx.core as mx

x = mx.random.normal((10000, 10000))
q, r = mx.linalg.qr(x, stream=mx.cpu)
mx.eval(r)
mx.eval(mx.diag(r))

Expected behavior
I'd expect the operation not to fail, given that the intermediate evaluation lets it succeed. But if this is a simple oom error then sorry for the false report.

Desktop:

  • OS Version: Tahoe 26.4.1
  • Mlx Version: 0.31.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions