Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 13 additions & 11 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,21 @@
* @Paulescu

# Domain owners
/docs/fine-tuning/ @Liquid4All/fine-tuning-team
/customization/ @Liquid4All/fine-tuning-team

/docs/inference @Liquid4All/inference-team
/docs/inference/*-deployment.mdx @tuliren
/deployment/gpu-inference/ @Liquid4All/inference-team
/deployment/gpu-inference/baseten.mdx @tuliren
/deployment/gpu-inference/fal.mdx @tuliren
/deployment/gpu-inference/modal.mdx @tuliren
/deployment/on-device/ @Liquid4All/inference-team

/docs/key-concepts/ @mlabonne
/docs/models/audio-models.mdx @haerski
/docs/models/vision-models.mdx @ankke
/docs/models/ @mlabonne
/lfm/key-concepts/ @mlabonne
/lfm/models/audio-models.mdx @haerski
/lfm/models/vision-models.mdx @ankke
/lfm/models/ @mlabonne

/leap/ @dbhathena
/leap/edge-sdk/ @iamstuffed
/leap/leap-bundle/ @tuliren
/leap/finetuning.mdx @Liquid4All/fine-tuning-team
/deployment/on-device/ios/ @iamstuffed
/deployment/on-device/android/ @iamstuffed
/deployment/tools/model-bundling/ @tuliren

/.github/workflows/ @tuliren
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ LEAP Finetune will provide:
While LEAP Finetune is in development, you can fine-tune models using:

<CardGroup cols={2}>
<Card title="TRL" icon="graduation-cap" href="/docs/fine-tuning/trl">
<Card title="TRL" icon="graduation-cap" href="/customization/finetuning-frameworks/trl">
Hugging Face's training library with LoRA/QLoRA support
</Card>
<Card title="Unsloth" icon="zap" href="/docs/fine-tuning/unsloth">
<Card title="Unsloth" icon="zap" href="/customization/finetuning-frameworks/unsloth">
Memory-efficient fine-tuning with 2x faster training
</Card>
</CardGroup>
Expand All @@ -33,8 +33,8 @@ While LEAP Finetune is in development, you can fine-tune models using:
After fine-tuning with TRL or Unsloth, prepare your model for edge deployment:

1. **Fine-tune** your model using TRL or Unsloth
2. **Convert** to edge-optimized format using the [Model Bundling Service](/leap/leap-bundle/quick-start)
3. **Deploy** to mobile devices using the [LEAP SDK](/leap/edge-sdk/overview)
2. **Convert** to edge-optimized format using the [Model Bundling Service](/deployment/tools/model-bundling/quick-start)
3. **Deploy** to mobile devices using the [LEAP SDK](/deployment/on-device/ios/ios-quick-start-guide)

```bash
# Example: Bundle a fine-tuned model for edge deployment
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description: "TRL (Transformer Reinforcement Learning) is a library for fine-tun

LFM models work out-of-the-box with TRL without requiring any custom integration.

Different training methods require specific dataset formats. See [Finetuning Datasets](/docs/fine-tuning/datasets) for format requirements.
Different training methods require specific dataset formats. See [Finetuning Datasets](/customization/finetuning-frameworks/datasets) for format requirements.

## Installation[​](#installation "Direct link to Installation")

Expand All @@ -27,7 +27,7 @@ pip install trl>=0.9.0 transformers>=4.55.0 torch>=2.6 peft accelerate

[![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/💧_LFM2_5_SFT_with_TRL.ipynb)

The `SFTTrainer` makes it easy to fine-tune LFM models on instruction-following or conversational datasets. It handles chat templates, packing, and dataset formatting automatically. SFT training requires [Instruction datasets](/docs/fine-tuning/datasets#instruction-datasets-sft).
The `SFTTrainer` makes it easy to fine-tune LFM models on instruction-following or conversational datasets. It handles chat templates, packing, and dataset formatting automatically. SFT training requires [Instruction datasets](/customization/finetuning-frameworks/datasets#instruction-datasets-sft).

### LoRA Fine-Tuning (Recommended)[​](#lora-fine-tuning-recommended "Direct link to LoRA Fine-Tuning (Recommended)")

Expand Down Expand Up @@ -132,7 +132,7 @@ trainer.train()

[![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/💧_LFM2_5_VL_SFT_with_TRL.ipynb)

The `SFTTrainer` also supports fine-tuning Vision Language Models like `LFM2.5-VL-1.6B` on image-text datasets. VLM fine-tuning requires [Vision datasets](/docs/fine-tuning/datasets#vision-datasets-vlm-sft) and a few key differences from text-only SFT:
The `SFTTrainer` also supports fine-tuning Vision Language Models like `LFM2.5-VL-1.6B` on image-text datasets. VLM fine-tuning requires [Vision datasets](/customization/finetuning-frameworks/datasets#vision-datasets-vlm-sft) and a few key differences from text-only SFT:

* Uses `AutoModelForImageTextToText` instead of `AutoModelForCausalLM`
* Uses `AutoProcessor` instead of just a tokenizer
Expand Down Expand Up @@ -290,7 +290,7 @@ trainer.train()

[![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/💧_LFM2_DPO_with_TRL.ipynb)

The `DPOTrainer` implements Direct Preference Optimization, a method to align models with human preferences without requiring a separate reward model. DPO training requires [Preference datasets](/docs/fine-tuning/datasets#preference-datasets-dpo) with chosen and rejected response pairs.
The `DPOTrainer` implements Direct Preference Optimization, a method to align models with human preferences without requiring a separate reward model. DPO training requires [Preference datasets](/customization/finetuning-frameworks/datasets#preference-datasets-dpo) with chosen and rejected response pairs.

### DPO with LoRA (Recommended)[​](#dpo-with-lora-recommended "Direct link to DPO with LoRA (Recommended)")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ description: "Unsloth makes fine-tuning LLMs 2-5x faster with 70% less memory th
Use Unsloth for faster training with optimized kernels, reduced memory usage, and built-in quantization support.
</Tip>

LFM2.5 models are fully supported by Unsloth. For comprehensive guides and tutorials, see the [official Unsloth LFM2.5 documentation](https://unsloth.ai/docs/models/tutorials/lfm2.5).
LFM2.5 models are fully supported by Unsloth. For comprehensive guides and tutorials, see the [official Unsloth LFM2.5 documentation](https://unsloth.ai/lfm/models/tutorials/lfm2.5).

Different training methods require specific dataset formats. See [Finetuning Datasets](/docs/fine-tuning/datasets) for format requirements for [SFT](/docs/fine-tuning/datasets#instruction-datasets-sft) and [GRPO](/docs/fine-tuning/datasets#prompt-only-datasets-grpo).
Different training methods require specific dataset formats. See [Finetuning Datasets](/customization/finetuning-frameworks/datasets) for format requirements for [SFT](/customization/finetuning-frameworks/datasets#instruction-datasets-sft) and [GRPO](/customization/finetuning-frameworks/datasets#prompt-only-datasets-grpo).

## Notebooks

Expand Down Expand Up @@ -85,5 +85,5 @@ FastLanguageModel.for_inference(model)
## Resources

* [Unsloth Documentation](https://unsloth.ai/docs)
* [Unsloth LFM2.5 Tutorial](https://unsloth.ai/docs/models/tutorials/lfm2.5)
* [Unsloth LFM2.5 Tutorial](https://unsloth.ai/lfm/models/tutorials/lfm2.5)
* [Liquid AI Cookbook](https://github.com/Liquid4All/cookbook)
8 changes: 8 additions & 0 deletions customization/getting-started/connect-ai-tools.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: "Connect AI Tools"
description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation"
---

import ConnectAiTools from "/snippets/connect-ai-tools.mdx";

<ConnectAiTools></ConnectAiTools>
23 changes: 23 additions & 0 deletions customization/getting-started/welcome.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: "Customization Options"
description: "Fine-tune and customize Liquid Foundation Models for your specific use cases."
---

LFM models support fine-tuning with popular frameworks and tools. Whether you need to adapt models for domain-specific tasks, improve accuracy on your data, or optimize for production workflows, these guides will help you get started.

## Get Started

<CardGroup cols={2}>
<Card title="Workbench" icon="wrench" href="/customization/tools/workbench">
Evaluate and iterate on prompts with Liquid's no-code Workbench tool
</Card>
<Card title="Finetuning Datasets" icon="database" href="/customization/finetuning-frameworks/datasets">
Prepare datasets in the right format for SFT, DPO, and GRPO training
</Card>
<Card title="TRL" icon="sliders" href="/customization/finetuning-frameworks/trl">
Fine-tune LFM models using Hugging Face's TRL library
</Card>
<Card title="Unsloth" icon="bolt" href="/customization/finetuning-frameworks/unsloth">
Fast and memory-efficient fine-tuning with Unsloth
</Card>
</CardGroup>
File renamed without changes.
8 changes: 8 additions & 0 deletions deployment/getting-started/connect-ai-tools.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: "Connect AI Tools"
description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation"
---

import ConnectAiTools from "/snippets/connect-ai-tools.mdx";

<ConnectAiTools></ConnectAiTools>
60 changes: 60 additions & 0 deletions deployment/getting-started/welcome.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
title: "Deployment Options"
description: "Deploy Liquid Foundation Models on any platform — from mobile devices to GPU clusters."
---

LFM models are designed for efficient deployment across a wide range of platforms. Run models on-device for privacy and low latency, or scale up with GPU inference for production workloads.

## On-Device

<CardGroup cols={3}>
<Card title="iOS SDK" icon="apple" href="/deployment/on-device/ios/ios-quick-start-guide">
Deploy models natively on iPhone and iPad
</Card>
<Card title="Android SDK" icon="robot" href="/deployment/on-device/android/android-quick-start-guide">
Deploy models natively on Android devices
</Card>
<Card title="llama.cpp" icon="terminal" href="/deployment/on-device/llama-cpp">
CPU-first inference with cross-platform support
</Card>
<Card title="MLX" icon="microchip" href="/deployment/on-device/mlx">
Optimized inference on Apple Silicon
</Card>
<Card title="ONNX" icon="cube" href="/deployment/on-device/onnx">
Cross-platform inference with ONNX Runtime
</Card>
<Card title="Ollama" icon="download" href="/deployment/on-device/ollama">
Easy local deployment and model management
</Card>
</CardGroup>

## GPU Inference

<CardGroup cols={3}>
<Card title="Transformers" icon="code" href="/deployment/gpu-inference/transformers">
Flexible inference with Hugging Face Transformers
</Card>
<Card title="vLLM" icon="bolt" href="/deployment/gpu-inference/vllm">
High-throughput production serving
</Card>
<Card title="SGLang" icon="server" href="/deployment/gpu-inference/sglang">
Structured generation and fast serving
</Card>
<Card title="Modal" icon="cloud" href="/deployment/gpu-inference/modal">
Serverless GPU deployment
</Card>
<Card title="Baseten" icon="cloud" href="/deployment/gpu-inference/baseten">
Production model inference platform
</Card>
<Card title="Fal" icon="cloud" href="/deployment/gpu-inference/fal">
Fast inference API platform
</Card>
</CardGroup>

## Tools

<CardGroup cols={1}>
<Card title="Model Bundling Services" icon="box" href="/deployment/tools/model-bundling/quick-start">
Package and distribute optimized model bundles for edge deployment
</Card>
</CardGroup>
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: "SGLang is a fast serving framework for large language models. It f
Use SGLang for ultra-low latency, high-throughput production serving with many concurrent requests.
</Tip>

SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/docs/inference/llama-cpp) instead.
SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/deployment/on-device/llama-cpp) instead.

## Supported Models

Expand All @@ -18,7 +18,7 @@ SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using
| Vision models | Not yet supported | LFM2-VL |

<Note>
MoE model support has been merged into SGLang but is not yet included in a stable release — [install from main](#install-from-main-moe-support) to use MoE models now. Vision models are not yet supported in SGLang — use [Transformers](/docs/inference/transformers) for vision workloads.
MoE model support has been merged into SGLang but is not yet included in a stable release — [install from main](#install-from-main-moe-support) to use MoE models now. Vision models are not yet supported in SGLang — use [Transformers](/deployment/gpu-inference/transformers) for vision workloads.
</Note>

## Installation
Expand Down Expand Up @@ -119,7 +119,7 @@ response = client.chat.completions.create(
print(response.choices[0].message)
```

For more details on tool use with LFM models, see [Tool Use](/docs/key-concepts/tool-use).
For more details on tool use with LFM models, see [Tool Use](/lfm/key-concepts/tool-use).

<Accordion title="Curl request example">
```bash
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: "Transformers is a library for inference and training of pretrained
Use Transformers for simple inference without extra dependencies, research and experimentation, or integration with the Hugging Face ecosystem.
</Tip>

Transformers provides the most flexibility for model development and is ideal for users who want direct access to model internals. For production deployments with high throughput, consider using [vLLM](/docs/inference/vllm).
Transformers provides the most flexibility for model development and is ideal for users who want direct access to model internals. For production deployments with high throughput, consider using [vLLM](/deployment/gpu-inference/vllm).

<div className="colab-link">
<a href="https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/LFM2_Inference_with_Transformers.ipynb" target="_blank">
Expand Down Expand Up @@ -165,7 +165,7 @@ output = model.generate(input_ids, streamer=streamer, max_new_tokens=512)
Process multiple prompts in a single batch for efficiency. See the [batching documentation](https://huggingface.co/docs/transformers/en/main_classes/text_generation#batch-generation) for more details:

<Note>
Batching is not automatically a win for performance. For high-performance batching with optimized throughput, consider using [vLLM](/docs/inference/vllm).
Batching is not automatically a win for performance. For high-performance batching with optimized throughput, consider using [vLLM](/deployment/gpu-inference/vllm).
</Note>

```python
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: "vLLM is a high-throughput and memory-efficient inference engine fo
Use vLLM for high-throughput production deployments, batch processing, or serving models via an API.
</Tip>

vLLM offers significantly higher throughput than [Transformers](/docs/inference/transformers), making it ideal for serving many concurrent requests. However, it requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/docs/inference/llama-cpp) instead.
vLLM offers significantly higher throughput than [Transformers](/deployment/gpu-inference/transformers), making it ideal for serving many concurrent requests. However, it requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/deployment/on-device/llama-cpp) instead.

<div className="colab-link">
<a href="https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/LFM2_Inference_with_vLLM.ipynb" target="_blank">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -450,4 +450,4 @@ In this pattern:

See [LeapSDK-Examples](https://github.com/Liquid4All/LeapSDK-Examples) for complete example apps using LeapSDK.

[Edit this page](https://github.com/Liquid4All/docs/tree/main/leap/edge-sdk/android/android-quick-start-guide.mdx)
[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/android/android-quick-start-guide.mdx)
Original file line number Diff line number Diff line change
Expand Up @@ -323,10 +323,10 @@ conversation = current.modelRunner.createConversationFromHistory(

## Next steps[​](#next-steps "Direct link to Next steps")

* Learn how to expose structured JSON outputs with the [`@Generatable` macros](/leap/edge-sdk/ios/constrained-generation).
* Wire up tools and external APIs with [Function Calling](/leap/edge-sdk/ios/function-calling).
* Compare on-device and cloud behaviour in [Cloud AI Comparison](/leap/edge-sdk/ios/cloud-ai-comparison).
* Learn how to expose structured JSON outputs with the [`@Generatable` macros](/deployment/on-device/ios/constrained-generation).
* Wire up tools and external APIs with [Function Calling](/deployment/on-device/ios/function-calling).
* Compare on-device and cloud behaviour in [Cloud AI Comparison](/deployment/on-device/ios/cloud-ai-comparison).

You now have a project that loads an on-device model, streams responses, and is ready for advanced features like structured output and tool use.

[Edit this page](https://github.com/Liquid4All/docs/tree/main/leap/edge-sdk/ios/ios-quick-start-guide.mdx)
[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/ios/ios-quick-start-guide.mdx)
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: "llama.cpp is a C++ library for efficient LLM inference with minima
Use llama.cpp for CPU-only environments, local development, or edge deployment and on-device inference.
</Tip>

For GPU-accelerated inference at scale, consider using [vLLM](/docs/inference/vllm) instead.
For GPU-accelerated inference at scale, consider using [vLLM](/deployment/gpu-inference/vllm) instead.

<div className="colab-link">
<a href="https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/LFM2_Inference_with_llama_cpp.ipynb" target="_blank">
Expand Down Expand Up @@ -100,7 +100,7 @@ For GPU-accelerated inference at scale, consider using [vLLM](/docs/inference/vl

## Downloading GGUF Models

llama.cpp uses the GGUF format, which stores quantized model weights for efficient inference. All LFM models are available in GGUF format on Hugging Face. See the [Models page](/docs/models/complete-library) for all available GGUF models.
llama.cpp uses the GGUF format, which stores quantized model weights for efficient inference. All LFM models are available in GGUF format on Hugging Face. See the [Models page](/lfm/models/complete-library) for all available GGUF models.

You can download LFM models in GGUF format from Hugging Face as follows:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Download and install LM Studio directly from [lmstudio.ai](https://lmstudio.ai/d
3. Select a model and quantization level (`Q4_K_M` recommended)
4. Click **Download**

See the [Models page](/docs/models/complete-library) for all available GGUF models.
See the [Models page](/lfm/models/complete-library) for all available GGUF models.

## Using the Chat Interface

Expand Down
2 changes: 1 addition & 1 deletion docs/inference/mlx.mdx → deployment/on-device/mlx.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ pip install mlx-lm

The `mlx-lm` package provides a simple interface for text generation with MLX models.

See the [Models page](/docs/models/complete-library) for all available MLX models, or browse MLX community models at [mlx-community LFM2 models](https://huggingface.co/models?sort=created&search=mlx-communityLFM2).
See the [Models page](/lfm/models/complete-library) for all available MLX models, or browse MLX community models at [mlx-community LFM2 models](https://huggingface.co/models?sort=created&search=mlx-communityLFM2).

```python
from mlx_lm import load, generate
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ You can run LFM2 models directly from Hugging Face:
ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
```

See the [Models page](/docs/models/complete-library) for all available GGUF repositories.
See the [Models page](/lfm/models/complete-library) for all available GGUF repositories.

To use a local GGUF file, first download a model from Hugging Face:

Expand Down
Loading