forked from ggml-org/llama.cpp
-
-
Notifications
You must be signed in to change notification settings - Fork 283
Pull requests: TheTom/llama-cpp-turboquant
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
cuda-memset-sync-and-cuda-event-sync-fixes-for-sm_100-sm_120
ggml
Nvidia GPU
#154
opened May 22, 2026 by
aggroed
Loading…
fix(delta-net): fix GGML_ASSERT crash in gated_delta_net with n_rs_seq > 0
Apple Metal
build
devops
documentation
Improvements or additions to documentation
examples
ggml
Hexagon
model
Nvidia GPU
OpenCL
python
script
server/ui
server
SYCL
testing
Vulkan
WebGPU
#152
opened May 21, 2026 by
JEF1056
Loading…
3 tasks
fix(cuda): MTP + sm_89 compatibility for GCC 12 host compiler
ggml
Nvidia GPU
#150
opened May 20, 2026 by
altifilmperisi
Loading…
spec: avoid all-token outputs during MTP prefill
examples
model
server
#149
opened May 20, 2026 by
claude-eric-steiner
Loading…
vulkan: add TurboQuant KV cache support and optimized turbo mat-vec paths
ggml
Vulkan
#140
opened May 10, 2026 by
Fenix46
Loading…
fix(qwen35): support Qwen3.5:9B loading from Ollama GGUF
model
#135
opened May 8, 2026 by
Jordan-HS
Loading…
vendor: bump cpp-httplib to 0.43.2 (openssl 4.0.0 fix)
python
script
#121
opened May 4, 2026 by
TheTom
Owner
Loading…
1 of 3 tasks
HIP mixed TurboQuant vec FA on gfx900/gfx906
build
ggml
Nvidia GPU
#99
opened Apr 21, 2026 by
2bigO
Loading…
perf: turbo VEC flash attention — +9% decode on CUDA via autoresearch
ggml
Nvidia GPU
script
#53
opened Apr 4, 2026 by
signalnine
Loading…
7 tasks done
fix: HIP/ROCm compatibility — check cudaMemcpyToSymbol errors, guard …
ggml
Nvidia GPU
#41
opened Apr 1, 2026 by
terrysimons
•
Draft
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.