docs: add Fedora 43 to FLM NPU Linux setup guide#2473
Open
bong-water-water-bong wants to merge 12 commits into
Open
docs: add Fedora 43 to FLM NPU Linux setup guide#2473bong-water-water-bong wants to merge 12 commits into
bong-water-water-bong wants to merge 12 commits into
Conversation
…r OpenAI compat Closes lemonade-sdk#1370 — OpenCode / @ai-sdk/openai-compatible streaming crash Three fixes for reasoning model streaming: 1. Streaming proxy normalization (streaming_proxy.cpp): - Intercepts each SSE data: {...} line in forward_sse_stream() - Injects content: "" when reasoning_content is present without content - Injects role: "assistant" when null/missing on assistant deltas - Only applies to chat.completion.chunk objects (non-chat passthrough) 2. Non-streaming response (server.cpp): - Same content injection for REST chat completions response 3. thinking: false passthrough (server.cpp): - Replaced strip_handled_thinking_fields() (which erased enable_thinking/ thinking before forwarding) with normalize_thinking_fields() which renames thinking → enable_thinking and keeps it in the forwarded request. FLM/vLLM/cloud backends now see enable_thinking. - /no_think prefix retained for llama.cpp compatibility Tests: 11 unit tests covering role normalization, reasoning content normalization, carriage return, multi-choice, multi-line streams. 7/7 C++ tests pass (100%).
Closes lemonade-sdk#2371 — multi-GPU systems only got ROCm for the first GPU arch get_rocm_arch() iterates AMD GPUs (iGPU first, then dGPU) and returns only the first match. On systems with both an iGPU and dGPU with different architectures, TheRock was only installed for the iGPU. Fix: - Add get_rocm_arches() returning ALL detected AMD GPU architectures (deduplicated, iGPU-first ordering preserved) - Update install_therock_if_needed() to install TheRock for every arch - Keep get_rocm_arch() for backward compat (rocm_channel, display)
…dk#1364, lemonade-sdk#1546) lemonade-sdk#1364 — Large Prompts Timing Out - Add SSE keepalive heartbeat thread in forward_sse_stream() - Sends : keepalive\n\n every 10s during prefill while waiting for first token - Prevents client-side read timeouts on long-running prompt processing - Thread-safe via shared mutex with the libcurl write callback lemonade-sdk#1546 — Model Download Resilience - Add .completed sentinel written after all files are verified in download - is_checkpoint_path_complete() checks for .completed as authoritative marker - Prevents corrupt partially-downloaded files from appearing complete - Hardened recursive_directory_iterator with skip_permission_denied + error_code
Add a pre-load memory check in Router::load_model() that compares model_info.size (file size in GB) against get_available_memory_gb() for the target device. Logs a warning when the model may not fit. Chose warn-only (not block) because: 1. GGUF file size != load-time memory (mmap'd, paged) 2. Auto-tune ctx_size resolver will reduce context to fit 3. A hard block would frustrate users who know their setup
Completes fixes for lemonade-sdk#1364, lemonade-sdk#1546, lemonade-sdk#1804: ## lemonade-sdk#1364 — SSE heartbeat during long prefill Injects : keepalive\n\n every 10s during prefill to prevent client-side read timeouts on long prompts (15k tokens → 5 min prefill). ## #1546b — .completed sentinel for download verification Written after all files are verified in download_from_huggingface(). is_checkpoint_path_complete() checks for it, preventing corrupt partially- downloaded files from appearing complete after a crash. ## #1546a — Model-level download resume fast-path download_from_huggingface() now accepts do_not_upgrade flag. When set and .completed sentinel exists, skips the HF API call entirely. ## #1546c — Directory iterator hardening discover_extra_models() uses skip_permission_denied + error_code handling to prevent crashes on temp files from interrupted downloads. ## lemonade-sdk#1804 — Pre-load OOM guard Upgraded the pre-load memory check from warning to hard block when model size exceeds 2x available memory headroom. Prevents OOM killer crashes with a clear error message instead. ## CI — PR-Agent + Qodo dual review Added pr-agent-review.yml (DeepSeek) and qodo-merge.yml workflows.
Before fetching the HF API and rebuilding the file list, check for an existing .download_manifest.json with incomplete files. If found, resume downloading from the partial state instead of starting over. This avoids re-downloading already-completed files after a network interruption or Ctrl-C during model pull.
This reverts commit 11991bc.
…ection (lemonade-sdk#2414) The function used a std::lock_guard which caused a deadlock when build_recipes_info re-entered get_system_info_with_cache via get_rocm_arch(). Fix by switching to std::unique_lock, marking s_recipes_computed early, and unlocking during recipe computation. On failure the flag resets so the next call retries.
…ete (lemonade-sdk#2435) The draft checkpoint is optional — the model can run without MTP spec decoding. But are_required_checkpoints_complete() iterates over ALL checkpoint types, and if the draft model hasn't been downloaded yet the entire model gets marked as not-downloaded / 'unreadable'. Fix by skipping the draft type alongside npu_cache.
Add Fedora 43 as a distribution option in the FLM NPU Linux dropdown selector with COPR-based install instructions using the abn/amd-npu and abn/lemonade repositories. Closes lemonade-sdk#1315 Co-Authored-By: Claude <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
This was referenced Jun 28, 2026
superm1
reviewed
Jun 28, 2026
superm1
reviewed
Jun 28, 2026
superm1
reviewed
Jun 28, 2026
superm1
left a comment
Member
There was a problem hiding this comment.
please split out anything but doc changes to their own PR. The doc changes at least look ok.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add Fedora 43 as a distribution option in the FLM NPU Linux dropdown selector with COPR-based install instructions. Uses the
abn/amd-npuandabn/lemonadeCOPR repos for pre-built XRT and FLM packages — no source builds needed.Closes #1315.
Changes
Notes
This supersedes the stale PR #1320 which had merge conflicts after the docs restructure in #1776. The original PR's
dev-getting-started.mdchanges are no longer needed — that file was intentionally removed in the restructure.🤖 Generated with Claude Code