chore: bump llama.cpp to b9699 by github-actions[bot] · Pull Request #17 · leehack/llama-web-bridge

github-actions · 2026-05-19T13:32:23Z

llama.cpp update

Previous pin: b9165
New pin: b9699
Upstream release: https://github.com/ggml-org/llama.cpp/releases/tag/b9699
Compare: ggml-org/llama.cpp@b9165...b9699

Upstream changelog

Release notes for b9699

Details

sycl : support MUL_MAT and OUT_PROD with Q1_0 (#24721)

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

Commit range

Commits from b9165 to b9699 (first 80)

common : support manually triggering the reasoning budget end sequence (#23949) (5254a79)
vulkan: Removed unused functions (#23175) (f8c0a19)
vulkan: Block-load Q3_K/Q6_K block data and subtract on 32b ints (#23056) (1962000)
model: Add EXAONE 4.5 implementations (#21733) (48b88c3)
security : disable private disclosures (#23963) (02a5701)
TP: quantized KV cache support (#23792) (8e6fff8)
vocab: add normalizer.lowercase support to WPM (#23899) (5aba536)
vulkan: reduce host memory lock contention (#23376) (bef69f1)
vulkan: don't hold the device mutex while compiling pipelines (#23641) (55ac090)
metal: template GLU kernels to support f16/f32 (#23882) (95b8b8e)
llama: limit max outputs of llama_context (#23861) (de6f727)
vendor : update cpp-httplib to 0.46.1 (#23980) (335abed)
opencl: add basic support for q5_0 and q5_1 (#23548) (27d9ed8)
nix : add nix-nodejs facilities to build Web UI (#23846) (5aa3a64)
speculative : fix n_outputs_max and remove draft-simple auto-enable (#23988) (5dcb711)
revert to using global_invocation_id for cpy shader (#23955) (b8275a8)
opencl: fix compiler warnings for non-adreno path (#23922) (210a657)
clean up unused variables warnings (#23975) (1fd5f48)
server: real-time reasoning interruption via control endpoint (#23971) (354ebac)
hexagon: add gelu_quick (#24007) (d178a11)
hexagon: MUL_MAT, MUL_MAT_ID, FLASH_ATTN and GDN cleanup and optimizations for latest models (#23989) (8f7f3bf)
llama : deprecate llama_set_warmup (#24009) (4f3a4be)
convert : support Step3.7-Flash (#23845) (f7a0777)
kv-cache : SWA checkpoints store only non-masked cells (#23981) (2365315)
ui: Add Thinking mode toggle with reasoning effort levels + improvements for Chat Form Add Action UI (#23434) (f8e67fc)
ui: simplify network error handling (#23431) (69cea5b)
docs : update HOWTO-add-model.md (#23883) (d5ab083)
ci : reduce self-hosted server workflow jobs (#24012) (a468b89)
server: add SSE ping interval (#24013) (60130d1)
common : fix state save in common_prompt_batch_decode (#23468) (0b71540)
StepFun 3.5 MTP (#23274) (2187e00)
model : support granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2) (#22716) (bfb4308)
model: add Mellum architecture (#23966) (4fb16ec)
hexagon: profiler output fix and script updates (#24042) (5c394fd)
opencl: use flat variants of q4_K and q6_K gemv for very large M (#24006) (63e66fd)
arg : removed unecesary mmproj download when users pass --no-mmproj (#23425) (e366626)
ci : disable ccache for msvc windows release jobs (#23911) (4da6370)
update BoringSSL to 0.20260526.0 (#23794) (d545a2a)
tests : add support for qwen3 SSM archs (#24031) (06938ac)
cuda: reserve space for quantize kv-cache at startup (#23907) (f8f0a47)
ggml-cpu: use runtime SVE width in FWHT (#24059) (3571fa5)
Avoid PDL race conditions by disabling restrict when PDL is used (#24030) (9e58d4d)
ui: Mermaid Diagrams in chat + interactive preview (#24032) (ee4cf70)
mtmd, model: allow skip build_vit() (#24077) (a731805)
mtmd: enable non-causal vision for gemma 4 unified (#24082) (c8d6a00)
qwen35: use post-norm hidden state for MTP (#24025) (166fe29)
mtmd: fix Gemma 4 unified FPE (#24088) (94a220c)
sycl : Improve SYCL doc (#23025) (f478f1b)
ggml-cpu: extend RVV quantization vec dot to higher VLENs (#22754) (3c7450c)
ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834) (e8c5489)
metal : reduce rset heartbeat from 500ms -> 5ms (#24074) (3d19986)
tests : refactor test-save-load-state to accept token input (#24073) (65ef50a)
readme : add status badges (#24104) (6ddc943)
fix(mtmd): handle Gemma 4 audio projector embedding size (#24091) (e3ba22d)
cmake: skip cvector-generator and export-lora when CPU backend is disabled (#24053) (7ac5a42)
server : add header to tools/server/server-http.h (#24089) (0066404)
build : use umbrella Headers directory for XCFramework module map (#23974) (4d74287)
webui: fix tool selector toggle/counter, key tools by stable identity (#24065) (4586479)
agents: refactor, include more guidelines (#24111) (a121232)
server: avoid unnecessary checkpoint restore when new tokens are present (#24110) (6f3a9f3)
ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (#22209) (4c51309)
convert: Fix Gemma 4 Unified conversion (#24118) (e802356)
return filter to save memory (#24125) (0dbfa66)
ui: added single line reasoning preview (#23601) (5269770)
ui: Fixed packages (#24119) (21444c8)
Move duplicated imatrix code into single common imatrix-loader.cpp (#22445) (e7bcf1c)
webui: [a11y] fix keyboard navigation issues in chat interface and sidebar (#23132) (42b2d60)
arg: fix double mtp downloads (#24128) (260862b)
server : disable on-device spec checkpoints (#24108) (7c158fb)
sycl : port multi-column MMVQ from CUDA backend (#21845) (7fe2ae4)
ci : build-msys job slimming [no ci] (#24157) (46fa662)
CUDA: enroll mul_mat_vec_q_moe into pdl (#24087) (2154a0f)
kleidiai : dynamic chunck-based scheduling for hybrid execution (#23819) (3ecfb15)
hparams : refactor hparams.n_layer (#24060) (7acb4e8)
minor : fix lint issues (#24165) (59917d3)
docs: Update quantization readme (#24133) (ad1b88c)
ui: add ignore-scripts=true to npmrc (#24149) (cc7bef3)
Fix link to available UI settings (#24169) (9c955c4)
ui: run npm install when package-lock.json is newer than node_modules (#24171) (2016bf2)
model : fix llama_model::n_gpu_layers() (#24188) (96fbe00)

Web bridge review focus

Please pay extra attention to upstream changes touching:

WebGPU, WASM, Emscripten, pthreads, or memory64 build behavior
ggml backend APIs used by the bridge
model loading, tokenizer, chat template, context/state persistence, or cache semantics
CMake/build flags that can affect the generated JS/WASM artifacts

Validation

Emscripten build passed
Browser WebGPU/state-persistence smoke passed
Generated bridge artifacts include wasm32 and memory64 outputs
No stale hard-coded llama.cpp tag remains in CI/publish defaults

Automation behavior

This PR is managed from the stable branch automation/bump-llama-cpp. If another llama.cpp release appears before merge, the scheduled workflow updates this same PR instead of opening a duplicate. The workflow skips if a non-automation PR already changes llama_cpp.version.

github-actions Bot force-pushed the automation/bump-llama-cpp branch from c374d7d to b0e1e3f Compare May 19, 2026 13:32

github-actions Bot added dependencies automated labels May 19, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from b0e1e3f to dcacf23 Compare May 20, 2026 12:39

github-actions Bot changed the title ~~chore: bump llama.cpp to b9222~~ chore: bump llama.cpp to b9247 May 20, 2026

github-actions Bot changed the title ~~chore: bump llama.cpp to b9247~~ chore: bump llama.cpp to b9264 May 21, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from dcacf23 to d82afc2 Compare May 21, 2026 13:32

github-actions Bot changed the title ~~chore: bump llama.cpp to b9264~~ chore: bump llama.cpp to b9279 May 22, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from d82afc2 to 74a6dbd Compare May 22, 2026 12:35

github-actions Bot changed the title ~~chore: bump llama.cpp to b9279~~ chore: bump llama.cpp to b9310 May 25, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 74a6dbd to 56845d4 Compare May 25, 2026 13:43

github-actions Bot changed the title ~~chore: bump llama.cpp to b9310~~ chore: bump llama.cpp to b9360 May 27, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 56845d4 to a8ccf0f Compare May 27, 2026 13:49

github-actions Bot changed the title ~~chore: bump llama.cpp to b9360~~ chore: bump llama.cpp to b9374 May 28, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from a8ccf0f to c6e61ba Compare May 28, 2026 14:06

github-actions Bot changed the title ~~chore: bump llama.cpp to b9374~~ chore: bump llama.cpp to b9406 May 29, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from c6e61ba to d5f6ea3 Compare May 29, 2026 13:33

github-actions Bot changed the title ~~chore: bump llama.cpp to b9406~~ chore: bump llama.cpp to b9453 Jun 1, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from d5f6ea3 to 7dd05aa Compare June 1, 2026 16:20

github-actions Bot changed the title ~~chore: bump llama.cpp to b9453~~ chore: bump llama.cpp to b9479 Jun 2, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch 2 times, most recently from df4139e to 414160e Compare June 3, 2026 15:05

github-actions Bot changed the title ~~chore: bump llama.cpp to b9479~~ chore: bump llama.cpp to b9491 Jun 3, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 414160e to 4f6bee8 Compare June 4, 2026 13:32

github-actions Bot changed the title ~~chore: bump llama.cpp to b9491~~ chore: bump llama.cpp to b9505 Jun 4, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 4f6bee8 to 91ddc1e Compare June 5, 2026 13:25

github-actions Bot changed the title ~~chore: bump llama.cpp to b9505~~ chore: bump llama.cpp to b9528 Jun 5, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 91ddc1e to cdb5b75 Compare June 8, 2026 14:31

github-actions Bot changed the title ~~chore: bump llama.cpp to b9528~~ chore: bump llama.cpp to b9557 Jun 8, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from cdb5b75 to b0e93e0 Compare June 9, 2026 13:17

github-actions Bot changed the title ~~chore: bump llama.cpp to b9557~~ chore: bump llama.cpp to b9580 Jun 9, 2026

github-actions Bot changed the title ~~chore: bump llama.cpp to b9580~~ chore: bump llama.cpp to b9587 Jun 10, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch 2 times, most recently from 6102386 to 223ddce Compare June 11, 2026 14:12

github-actions Bot changed the title ~~chore: bump llama.cpp to b9587~~ chore: bump llama.cpp to b9596 Jun 11, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 223ddce to 2d6039b Compare June 12, 2026 13:43

github-actions Bot changed the title ~~chore: bump llama.cpp to b9596~~ chore: bump llama.cpp to b9610 Jun 12, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 2d6039b to 05804d5 Compare June 15, 2026 16:06

github-actions Bot changed the title ~~chore: bump llama.cpp to b9610~~ chore: bump llama.cpp to b9647 Jun 15, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 05804d5 to 039f9a4 Compare June 16, 2026 15:23

github-actions Bot changed the title ~~chore: bump llama.cpp to b9647~~ chore: bump llama.cpp to b9670 Jun 16, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 039f9a4 to aa70aba Compare June 17, 2026 14:01

github-actions Bot changed the title ~~chore: bump llama.cpp to b9670~~ chore: bump llama.cpp to b9682 Jun 17, 2026

chore: bump llama.cpp to b9699

029df9c

github-actions Bot force-pushed the automation/bump-llama-cpp branch from aa70aba to 029df9c Compare June 18, 2026 10:09

github-actions Bot changed the title ~~chore: bump llama.cpp to b9682~~ chore: bump llama.cpp to b9699 Jun 18, 2026

fix: adapt mtmd helper bitmap wrappers

6957ee6

leehack merged commit 9c57382 into main Jun 18, 2026
2 checks passed

leehack deleted the automation/bump-llama-cpp branch June 18, 2026 10:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: bump llama.cpp to b9699#17

chore: bump llama.cpp to b9699#17
leehack merged 2 commits into
mainfrom
automation/bump-llama-cpp

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

llama.cpp update

Upstream changelog

Commit range

Web bridge review focus

Validation

Automation behavior

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 19, 2026 •

edited

Loading