Liquid4All · alay2shah · Feb 18, 2026 · Feb 17, 2026 · Feb 17, 2026 · Feb 17, 2026
@@ -2,19 +2,21 @@
 * @Paulescu
 
 # Domain owners
-/docs/fine-tuning/ @Liquid4All/fine-tuning-team
+/customization/ @Liquid4All/fine-tuning-team
 
-/docs/inference @Liquid4All/inference-team
-/docs/inference/*-deployment.mdx @tuliren
+/deployment/gpu-inference/ @Liquid4All/inference-team
+/deployment/gpu-inference/baseten.mdx @tuliren
+/deployment/gpu-inference/fal.mdx @tuliren
+/deployment/gpu-inference/modal.mdx @tuliren
+/deployment/on-device/ @Liquid4All/inference-team
 
-/docs/key-concepts/ @mlabonne
-/docs/models/audio-models.mdx @haerski
-/docs/models/vision-models.mdx @ankke
-/docs/models/ @mlabonne
+/lfm/key-concepts/ @mlabonne
+/lfm/models/audio-models.mdx @haerski
+/lfm/models/vision-models.mdx @ankke
+/lfm/models/ @mlabonne
 
-/leap/ @dbhathena
-/leap/edge-sdk/ @iamstuffed
-/leap/leap-bundle/ @tuliren
-/leap/finetuning.mdx @Liquid4All/fine-tuning-team
+/deployment/on-device/ios/ @iamstuffed
+/deployment/on-device/android/ @iamstuffed
+/deployment/tools/model-bundling/ @tuliren
 
 /.github/workflows/ @tuliren
@@ -20,10 +20,10 @@ LEAP Finetune will provide:
 While LEAP Finetune is in development, you can fine-tune models using:
 
 <CardGroup cols={2}>
-  <Card title="TRL" icon="graduation-cap" href="/docs/fine-tuning/trl">
+  <Card title="TRL" icon="graduation-cap" href="/customization/finetuning-frameworks/trl">
     Hugging Face's training library with LoRA/QLoRA support
   </Card>
-  <Card title="Unsloth" icon="zap" href="/docs/fine-tuning/unsloth">
+  <Card title="Unsloth" icon="zap" href="/customization/finetuning-frameworks/unsloth">
     Memory-efficient fine-tuning with 2x faster training
   </Card>
 </CardGroup>
@@ -33,8 +33,8 @@ While LEAP Finetune is in development, you can fine-tune models using:
 After fine-tuning with TRL or Unsloth, prepare your model for edge deployment:
 
 1. **Fine-tune** your model using TRL or Unsloth
-2. **Convert** to edge-optimized format using the [Model Bundling Service](/leap/leap-bundle/quick-start)
-3. **Deploy** to mobile devices using the [LEAP SDK](/leap/edge-sdk/overview)
+2. **Convert** to edge-optimized format using the [Model Bundling Service](/deployment/tools/model-bundling/quick-start)
+3. **Deploy** to mobile devices using the [LEAP SDK](/deployment/on-device/ios/ios-quick-start-guide)
 
 ```bash
 # Example: Bundle a fine-tuned model for edge deployment

@@ -9,7 +9,7 @@ description: "TRL (Transformer Reinforcement Learning) is a library for fine-tun
 
 LFM models work out-of-the-box with TRL without requiring any custom integration.
 
-Different training methods require specific dataset formats. See [Finetuning Datasets](/docs/fine-tuning/datasets) for format requirements.
+Different training methods require specific dataset formats. See [Finetuning Datasets](/customization/finetuning-frameworks/datasets) for format requirements.
 
 ## Installation[](#installation "Direct link to Installation")
 
@@ -27,7 +27,7 @@ pip install trl>=0.9.0 transformers>=4.55.0 torch>=2.6 peft accelerate
 
 [![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/💧_LFM2_5_SFT_with_TRL.ipynb)
 
-The `SFTTrainer` makes it easy to fine-tune LFM models on instruction-following or conversational datasets. It handles chat templates, packing, and dataset formatting automatically. SFT training requires [Instruction datasets](/docs/fine-tuning/datasets#instruction-datasets-sft).
+The `SFTTrainer` makes it easy to fine-tune LFM models on instruction-following or conversational datasets. It handles chat templates, packing, and dataset formatting automatically. SFT training requires [Instruction datasets](/customization/finetuning-frameworks/datasets#instruction-datasets-sft).
 
 ### LoRA Fine-Tuning (Recommended)[](#lora-fine-tuning-recommended "Direct link to LoRA Fine-Tuning (Recommended)")
 
@@ -132,7 +132,7 @@ trainer.train()
 
 [![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/💧_LFM2_5_VL_SFT_with_TRL.ipynb)
 
-The `SFTTrainer` also supports fine-tuning Vision Language Models like `LFM2.5-VL-1.6B` on image-text datasets. VLM fine-tuning requires [Vision datasets](/docs/fine-tuning/datasets#vision-datasets-vlm-sft) and a few key differences from text-only SFT:
+The `SFTTrainer` also supports fine-tuning Vision Language Models like `LFM2.5-VL-1.6B` on image-text datasets. VLM fine-tuning requires [Vision datasets](/customization/finetuning-frameworks/datasets#vision-datasets-vlm-sft) and a few key differences from text-only SFT:
 
 * Uses `AutoModelForImageTextToText` instead of `AutoModelForCausalLM`
 * Uses `AutoProcessor` instead of just a tokenizer
@@ -290,7 +290,7 @@ trainer.train()
 
 [![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/💧_LFM2_DPO_with_TRL.ipynb)
 
-The `DPOTrainer` implements Direct Preference Optimization, a method to align models with human preferences without requiring a separate reward model. DPO training requires [Preference datasets](/docs/fine-tuning/datasets#preference-datasets-dpo) with chosen and rejected response pairs.
+The `DPOTrainer` implements Direct Preference Optimization, a method to align models with human preferences without requiring a separate reward model. DPO training requires [Preference datasets](/customization/finetuning-frameworks/datasets#preference-datasets-dpo) with chosen and rejected response pairs.
 
 ### DPO with LoRA (Recommended)[](#dpo-with-lora-recommended "Direct link to DPO with LoRA (Recommended)")
 

@@ -7,9 +7,9 @@ description: "Unsloth makes fine-tuning LLMs 2-5x faster with 70% less memory th
   Use Unsloth for faster training with optimized kernels, reduced memory usage, and built-in quantization support.
 </Tip>
 
-LFM2.5 models are fully supported by Unsloth. For comprehensive guides and tutorials, see the [official Unsloth LFM2.5 documentation](https://unsloth.ai/docs/models/tutorials/lfm2.5).
+LFM2.5 models are fully supported by Unsloth. For comprehensive guides and tutorials, see the [official Unsloth LFM2.5 documentation](https://unsloth.ai/lfm/models/tutorials/lfm2.5).
 
-Different training methods require specific dataset formats. See [Finetuning Datasets](/docs/fine-tuning/datasets) for format requirements for [SFT](/docs/fine-tuning/datasets#instruction-datasets-sft) and [GRPO](/docs/fine-tuning/datasets#prompt-only-datasets-grpo).
+Different training methods require specific dataset formats. See [Finetuning Datasets](/customization/finetuning-frameworks/datasets) for format requirements for [SFT](/customization/finetuning-frameworks/datasets#instruction-datasets-sft) and [GRPO](/customization/finetuning-frameworks/datasets#prompt-only-datasets-grpo).
 
 ## Notebooks
 
@@ -85,5 +85,5 @@ FastLanguageModel.for_inference(model)
 ## Resources
 
 * [Unsloth Documentation](https://unsloth.ai/docs)
-* [Unsloth LFM2.5 Tutorial](https://unsloth.ai/docs/models/tutorials/lfm2.5)
+* [Unsloth LFM2.5 Tutorial](https://unsloth.ai/lfm/models/tutorials/lfm2.5)
 * [Liquid AI Cookbook](https://github.com/Liquid4All/cookbook)
@@ -0,0 +1,8 @@
+---
+title: "Connect AI Tools"
+description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation"
+---
+
+import ConnectAiTools from "/snippets/connect-ai-tools.mdx";
+
+<ConnectAiTools></ConnectAiTools>
@@ -0,0 +1,23 @@
+---
+title: "Customization Options"
+description: "Fine-tune and customize Liquid Foundation Models for your specific use cases."
+---
+
+LFM models support fine-tuning with popular frameworks and tools. Whether you need to adapt models for domain-specific tasks, improve accuracy on your data, or optimize for production workflows, these guides will help you get started.
+
+## Get Started
+
+<CardGroup cols={2}>
+  <Card title="Workbench" icon="wrench" href="/customization/tools/workbench">
+    Evaluate and iterate on prompts with Liquid's no-code Workbench tool
+  </Card>
+  <Card title="Finetuning Datasets" icon="database" href="/customization/finetuning-frameworks/datasets">
+    Prepare datasets in the right format for SFT, DPO, and GRPO training
+  </Card>
+  <Card title="TRL" icon="sliders" href="/customization/finetuning-frameworks/trl">
+    Fine-tune LFM models using Hugging Face's TRL library
+  </Card>
+  <Card title="Unsloth" icon="bolt" href="/customization/finetuning-frameworks/unsloth">
+    Fast and memory-efficient fine-tuning with Unsloth
+  </Card>
+</CardGroup>
@@ -0,0 +1,8 @@
+---
+title: "Connect AI Tools"
+description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation"
+---
+
+import ConnectAiTools from "/snippets/connect-ai-tools.mdx";
+
+<ConnectAiTools></ConnectAiTools>
@@ -0,0 +1,60 @@
+---
+title: "Deployment Options"
+description: "Deploy Liquid Foundation Models on any platform — from mobile devices to GPU clusters."
+---
+
+LFM models are designed for efficient deployment across a wide range of platforms. Run models on-device for privacy and low latency, or scale up with GPU inference for production workloads.
+
+## On-Device
+
+<CardGroup cols={3}>
+  <Card title="iOS SDK" icon="apple" href="/deployment/on-device/ios/ios-quick-start-guide">
+    Deploy models natively on iPhone and iPad
+  </Card>
+  <Card title="Android SDK" icon="robot" href="/deployment/on-device/android/android-quick-start-guide">
+    Deploy models natively on Android devices
+  </Card>
+  <Card title="llama.cpp" icon="terminal" href="/deployment/on-device/llama-cpp">
+    CPU-first inference with cross-platform support
+  </Card>
+  <Card title="MLX" icon="microchip" href="/deployment/on-device/mlx">
+    Optimized inference on Apple Silicon
+  </Card>
+  <Card title="ONNX" icon="cube" href="/deployment/on-device/onnx">
+    Cross-platform inference with ONNX Runtime
+  </Card>
+  <Card title="Ollama" icon="download" href="/deployment/on-device/ollama">
+    Easy local deployment and model management
+  </Card>
+</CardGroup>
+
+## GPU Inference
+
+<CardGroup cols={3}>
+  <Card title="Transformers" icon="code" href="/deployment/gpu-inference/transformers">
+    Flexible inference with Hugging Face Transformers
+  </Card>
+  <Card title="vLLM" icon="bolt" href="/deployment/gpu-inference/vllm">
+    High-throughput production serving
+  </Card>
+  <Card title="SGLang" icon="server" href="/deployment/gpu-inference/sglang">
+    Structured generation and fast serving
+  </Card>
+  <Card title="Modal" icon="cloud" href="/deployment/gpu-inference/modal">
+    Serverless GPU deployment
+  </Card>
+  <Card title="Baseten" icon="cloud" href="/deployment/gpu-inference/baseten">
+    Production model inference platform
+  </Card>
+  <Card title="Fal" icon="cloud" href="/deployment/gpu-inference/fal">
+    Fast inference API platform
+  </Card>
+</CardGroup>
+
+## Tools
+
+<CardGroup cols={1}>
+  <Card title="Model Bundling Services" icon="box" href="/deployment/tools/model-bundling/quick-start">
+    Package and distribute optimized model bundles for edge deployment
+  </Card>
+</CardGroup>
@@ -7,7 +7,7 @@ description: "SGLang is a fast serving framework for large language models. It f
   Use SGLang for ultra-low latency, high-throughput production serving with many concurrent requests.
 </Tip>
 
-SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/docs/inference/llama-cpp) instead.
+SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/deployment/on-device/llama-cpp) instead.
 
 ## Supported Models
 
@@ -18,7 +18,7 @@ SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using
 | Vision models | Not yet supported | LFM2-VL |
 
 <Note>
-MoE model support has been merged into SGLang but is not yet included in a stable release — [install from main](#install-from-main-moe-support) to use MoE models now. Vision models are not yet supported in SGLang — use [Transformers](/docs/inference/transformers) for vision workloads.
+MoE model support has been merged into SGLang but is not yet included in a stable release — [install from main](#install-from-main-moe-support) to use MoE models now. Vision models are not yet supported in SGLang — use [Transformers](/deployment/gpu-inference/transformers) for vision workloads.
 </Note>
 
 ## Installation
@@ -119,7 +119,7 @@ response = client.chat.completions.create(
 print(response.choices[0].message)
 ```
 
-For more details on tool use with LFM models, see [Tool Use](/docs/key-concepts/tool-use).
+For more details on tool use with LFM models, see [Tool Use](/lfm/key-concepts/tool-use).
 
 <Accordion title="Curl request example">
   ```bash

@@ -7,7 +7,7 @@ description: "Transformers is a library for inference and training of pretrained
   Use Transformers for simple inference without extra dependencies, research and experimentation, or integration with the Hugging Face ecosystem.
 </Tip>
 
-Transformers provides the most flexibility for model development and is ideal for users who want direct access to model internals. For production deployments with high throughput, consider using [vLLM](/docs/inference/vllm).
+Transformers provides the most flexibility for model development and is ideal for users who want direct access to model internals. For production deployments with high throughput, consider using [vLLM](/deployment/gpu-inference/vllm).
 
 <div className="colab-link">
   <a href="https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/LFM2_Inference_with_Transformers.ipynb" target="_blank">
@@ -165,7 +165,7 @@ output = model.generate(input_ids, streamer=streamer, max_new_tokens=512)
 Process multiple prompts in a single batch for efficiency. See the [batching documentation](https://huggingface.co/docs/transformers/en/main_classes/text_generation#batch-generation) for more details:
 
 <Note>
-  Batching is not automatically a win for performance. For high-performance batching with optimized throughput, consider using [vLLM](/docs/inference/vllm).
+  Batching is not automatically a win for performance. For high-performance batching with optimized throughput, consider using [vLLM](/deployment/gpu-inference/vllm).
 </Note>
 
 ```python

@@ -7,7 +7,7 @@ description: "vLLM is a high-throughput and memory-efficient inference engine fo
   Use vLLM for high-throughput production deployments, batch processing, or serving models via an API.
 </Tip>
 
-vLLM offers significantly higher throughput than [Transformers](/docs/inference/transformers), making it ideal for serving many concurrent requests. However, it requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/docs/inference/llama-cpp) instead.
+vLLM offers significantly higher throughput than [Transformers](/deployment/gpu-inference/transformers), making it ideal for serving many concurrent requests. However, it requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/deployment/on-device/llama-cpp) instead.
 
 <div className="colab-link">
   <a href="https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/LFM2_Inference_with_vLLM.ipynb" target="_blank">

@@ -450,4 +450,4 @@ In this pattern:
 
 See [LeapSDK-Examples](https://github.com/Liquid4All/LeapSDK-Examples) for complete example apps using LeapSDK.
 
-[Edit this page](https://github.com/Liquid4All/docs/tree/main/leap/edge-sdk/android/android-quick-start-guide.mdx)
+[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/android/android-quick-start-guide.mdx)
@@ -323,10 +323,10 @@ conversation = current.modelRunner.createConversationFromHistory(
 
 ## Next steps[](#next-steps "Direct link to Next steps")
 
-* Learn how to expose structured JSON outputs with the [`@Generatable` macros](/leap/edge-sdk/ios/constrained-generation).
-* Wire up tools and external APIs with [Function Calling](/leap/edge-sdk/ios/function-calling).
-* Compare on-device and cloud behaviour in [Cloud AI Comparison](/leap/edge-sdk/ios/cloud-ai-comparison).
+* Learn how to expose structured JSON outputs with the [`@Generatable` macros](/deployment/on-device/ios/constrained-generation).
+* Wire up tools and external APIs with [Function Calling](/deployment/on-device/ios/function-calling).
+* Compare on-device and cloud behaviour in [Cloud AI Comparison](/deployment/on-device/ios/cloud-ai-comparison).
 
 You now have a project that loads an on-device model, streams responses, and is ready for advanced features like structured output and tool use.
 
-[Edit this page](https://github.com/Liquid4All/docs/tree/main/leap/edge-sdk/ios/ios-quick-start-guide.mdx)
+[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/ios/ios-quick-start-guide.mdx)
@@ -7,7 +7,7 @@ description: "llama.cpp is a C++ library for efficient LLM inference with minima
   Use llama.cpp for CPU-only environments, local development, or edge deployment and on-device inference.
 </Tip>
 
-For GPU-accelerated inference at scale, consider using [vLLM](/docs/inference/vllm) instead.
+For GPU-accelerated inference at scale, consider using [vLLM](/deployment/gpu-inference/vllm) instead.
 
 <div className="colab-link">
   <a href="https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/LFM2_Inference_with_llama_cpp.ipynb" target="_blank">
@@ -100,7 +100,7 @@ For GPU-accelerated inference at scale, consider using [vLLM](/docs/inference/vl
 
 ## Downloading GGUF Models
 
-llama.cpp uses the GGUF format, which stores quantized model weights for efficient inference. All LFM models are available in GGUF format on Hugging Face. See the [Models page](/docs/models/complete-library) for all available GGUF models.
+llama.cpp uses the GGUF format, which stores quantized model weights for efficient inference. All LFM models are available in GGUF format on Hugging Face. See the [Models page](/lfm/models/complete-library) for all available GGUF models.
 
 You can download LFM models in GGUF format from Hugging Face as follows:
 

@@ -18,7 +18,7 @@ Download and install LM Studio directly from [lmstudio.ai](https://lmstudio.ai/d
 3. Select a model and quantization level (`Q4_K_M` recommended)
 4. Click **Download**
 
-See the [Models page](/docs/models/complete-library) for all available GGUF models.
+See the [Models page](/lfm/models/complete-library) for all available GGUF models.
 
 ## Using the Chat Interface
 

@@ -21,7 +21,7 @@ pip install mlx-lm
 
 The `mlx-lm` package provides a simple interface for text generation with MLX models.
 
-See the [Models page](/docs/models/complete-library) for all available MLX models, or browse MLX community models at [mlx-community LFM2 models](https://huggingface.co/models?sort=created&search=mlx-communityLFM2).
+See the [Models page](/lfm/models/complete-library) for all available MLX models, or browse MLX community models at [mlx-community LFM2 models](https://huggingface.co/models?sort=created&search=mlx-communityLFM2).
 
 ```python
 from mlx_lm import load, generate

@@ -68,7 +68,7 @@ You can run LFM2 models directly from Hugging Face:
 ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
 ```
 
-See the [Models page](/docs/models/complete-library) for all available GGUF repositories.
+See the [Models page](/lfm/models/complete-library) for all available GGUF repositories.
 
 To use a local GGUF file, first download a model from Hugging Face:
Original file line number	Diff line number	Diff line change
Expand Up		@@ -450,4 +450,4 @@ In this pattern:

		See [LeapSDK-Examples](https://github.com/Liquid4All/LeapSDK-Examples) for complete example apps using LeapSDK.

		[Edit this page](https://github.com/Liquid4All/docs/tree/main/leap/edge-sdk/android/android-quick-start-guide.mdx)
		[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/android/android-quick-start-guide.mdx)