Skip to content

docs: add Fedora 43 to FLM NPU Linux setup guide#2473

Open
bong-water-water-bong wants to merge 12 commits into
lemonade-sdk:mainfrom
bong-water-water-bong:fix/1315-fedora-docs
Open

docs: add Fedora 43 to FLM NPU Linux setup guide#2473
bong-water-water-bong wants to merge 12 commits into
lemonade-sdk:mainfrom
bong-water-water-bong:fix/1315-fedora-docs

Conversation

@bong-water-water-bong

Copy link
Copy Markdown
Contributor

Summary

Add Fedora 43 as a distribution option in the FLM NPU Linux dropdown selector with COPR-based install instructions. Uses the abn/amd-npu and abn/lemonade COPR repos for pre-built XRT and FLM packages — no source builds needed.

Closes #1315.

Changes

  • Add Fedora 43 to the distro selector dropdown
  • Add Fedora 43 install instructions (COPR enable, dnf install, systemd drop-in, flm validate)
  • Register fedora-43 in the JavaScript distro selector logic

Notes

This supersedes the stale PR #1320 which had merge conflicts after the docs restructure in #1776. The original PR's dev-getting-started.md changes are no longer needed — that file was intentionally removed in the restructure.

🤖 Generated with Claude Code

bong-water-water-bong and others added 12 commits June 27, 2026 11:59
…r OpenAI compat

Closes lemonade-sdk#1370 — OpenCode / @ai-sdk/openai-compatible streaming crash

Three fixes for reasoning model streaming:

1. Streaming proxy normalization (streaming_proxy.cpp):
   - Intercepts each SSE data: {...} line in forward_sse_stream()
   - Injects content: "" when reasoning_content is present without content
   - Injects role: "assistant" when null/missing on assistant deltas
   - Only applies to chat.completion.chunk objects (non-chat passthrough)

2. Non-streaming response (server.cpp):
   - Same content injection for REST chat completions response

3. thinking: false passthrough (server.cpp):
   - Replaced strip_handled_thinking_fields() (which erased enable_thinking/
     thinking before forwarding) with normalize_thinking_fields() which
     renames thinking → enable_thinking and keeps it in the forwarded
     request. FLM/vLLM/cloud backends now see enable_thinking.
   - /no_think prefix retained for llama.cpp compatibility

Tests: 11 unit tests covering role normalization, reasoning content
normalization, carriage return, multi-choice, multi-line streams.
7/7 C++ tests pass (100%).
Closes lemonade-sdk#2371 — multi-GPU systems only got ROCm for the first GPU arch

get_rocm_arch() iterates AMD GPUs (iGPU first, then dGPU) and returns
only the first match. On systems with both an iGPU and dGPU with
different architectures, TheRock was only installed for the iGPU.

Fix:
- Add get_rocm_arches() returning ALL detected AMD GPU architectures
  (deduplicated, iGPU-first ordering preserved)
- Update install_therock_if_needed() to install TheRock for every arch
- Keep get_rocm_arch() for backward compat (rocm_channel, display)
…dk#1364, lemonade-sdk#1546)

lemonade-sdk#1364 — Large Prompts Timing Out
- Add SSE keepalive heartbeat thread in forward_sse_stream()
- Sends : keepalive\n\n every 10s during prefill while waiting for first token
- Prevents client-side read timeouts on long-running prompt processing
- Thread-safe via shared mutex with the libcurl write callback

lemonade-sdk#1546 — Model Download Resilience
- Add .completed sentinel written after all files are verified in download
- is_checkpoint_path_complete() checks for .completed as authoritative marker
- Prevents corrupt partially-downloaded files from appearing complete
- Hardened recursive_directory_iterator with skip_permission_denied + error_code
Add a pre-load memory check in Router::load_model() that compares
model_info.size (file size in GB) against get_available_memory_gb()
for the target device. Logs a warning when the model may not fit.

Chose warn-only (not block) because:
1. GGUF file size != load-time memory (mmap'd, paged)
2. Auto-tune ctx_size resolver will reduce context to fit
3. A hard block would frustrate users who know their setup
Completes fixes for lemonade-sdk#1364, lemonade-sdk#1546, lemonade-sdk#1804:

## lemonade-sdk#1364 — SSE heartbeat during long prefill
Injects : keepalive\n\n every 10s during prefill to prevent client-side
read timeouts on long prompts (15k tokens → 5 min prefill).

## #1546b — .completed sentinel for download verification
Written after all files are verified in download_from_huggingface().
is_checkpoint_path_complete() checks for it, preventing corrupt partially-
downloaded files from appearing complete after a crash.

## #1546a — Model-level download resume fast-path
download_from_huggingface() now accepts do_not_upgrade flag. When set and
.completed sentinel exists, skips the HF API call entirely.

## #1546c — Directory iterator hardening
discover_extra_models() uses skip_permission_denied + error_code handling
to prevent crashes on temp files from interrupted downloads.

## lemonade-sdk#1804 — Pre-load OOM guard
Upgraded the pre-load memory check from warning to hard block when model
size exceeds 2x available memory headroom. Prevents OOM killer crashes
with a clear error message instead.

## CI — PR-Agent + Qodo dual review
Added pr-agent-review.yml (DeepSeek) and qodo-merge.yml workflows.
Before fetching the HF API and rebuilding the file list, check for an
existing .download_manifest.json with incomplete files. If found, resume
downloading from the partial state instead of starting over.

This avoids re-downloading already-completed files after a network
interruption or Ctrl-C during model pull.
…ection (lemonade-sdk#2414)

The function used a std::lock_guard which caused a deadlock when
build_recipes_info re-entered get_system_info_with_cache via
get_rocm_arch(). Fix by switching to std::unique_lock, marking
s_recipes_computed early, and unlocking during recipe computation.
On failure the flag resets so the next call retries.
…ete (lemonade-sdk#2435)

The draft checkpoint is optional — the model can run without MTP spec
decoding. But are_required_checkpoints_complete() iterates over ALL
checkpoint types, and if the draft model hasn't been downloaded yet
the entire model gets marked as not-downloaded / 'unreadable'.

Fix by skipping the draft type alongside npu_cache.
Add Fedora 43 as a distribution option in the FLM NPU Linux
dropdown selector with COPR-based install instructions using the
abn/amd-npu and abn/lemonade repositories.

Closes lemonade-sdk#1315

Co-Authored-By: Claude <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems unrelated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems unrelated

@superm1 superm1 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please split out anything but doc changes to their own PR. The doc changes at least look ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docs: Fedora 43 XDNA2 FLM beta setup is working but needs explicit Linux guidance

2 participants