TheTom / llama-cpp-turboquant Public

forked from ggml-org/llama.cpp

Notifications You must be signed in to change notification settings
Fork 283
Star 1.6k

Code
Issues 28
Pull requests 13
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security and quality
Insights

Pull requests: TheTom/llama-cpp-turboquant

Labels 35 Milestones 0

New pull request New

13 Open 70 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

cuda-memset-sync-and-cuda-event-sync-fixes-for-sm_100-sm_120 ggml Nvidia GPU

#154 opened May 22, 2026 by aggroed

Loading…

fix(delta-net): fix GGML_ASSERT crash in gated_delta_net with n_rs_seq > 0 Apple Metal build devops documentation

Improvements or additions to documentation

examples ggml Hexagon model Nvidia GPU OpenCL python script server/ui server SYCL testing Vulkan WebGPU

#152 opened May 21, 2026 by JEF1056

Loading…

3 tasks

fix(cuda): MTP + sm_89 compatibility for GCC 12 host compiler ggml Nvidia GPU

#150 opened May 20, 2026 by altifilmperisi

Loading…

spec: avoid all-token outputs during MTP prefill examples model server

#149 opened May 20, 2026 by claude-eric-steiner

Loading…

SYCL Turboquant implementation attempt ggml SYCL

#144 opened May 13, 2026 by cclecle

Loading…

vulkan: add TurboQuant KV cache support and optimized turbo mat-vec paths ggml Vulkan

#140 opened May 10, 2026 by Fenix46

Loading…

fix(qwen35): support Qwen3.5:9B loading from Ollama GGUF model

#135 opened May 8, 2026 by Jordan-HS

Loading…

vendor: bump cpp-httplib to 0.43.2 (openssl 4.0.0 fix) python script

#121 opened May 4, 2026 by TheTom Owner

Loading…

1 of 3 tasks

Turbo dflash examples model python server

#103 opened Apr 23, 2026 by aminya • Draft

HIP mixed TurboQuant vec FA on gfx900/gfx906 build ggml Nvidia GPU

#99 opened Apr 21, 2026 by 2bigO

Loading…

fix QJL turboquant implementation ggml testing

#77 opened Apr 15, 2026 by zhangsipeng

Loading…

perf: turbo VEC flash attention — +9% decode on CUDA via autoresearch ggml Nvidia GPU script

#53 opened Apr 4, 2026 by signalnine

Loading…

7 tasks done

fix: HIP/ROCm compatibility — check cudaMemcpyToSymbol errors, guard … ggml Nvidia GPU

#41 opened Apr 1, 2026 by terrysimons • Draft

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!