diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
index f0c8e0a..20981d7 100644
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@@ -2,19 +2,21 @@
 * @Paulescu
 
 # Domain owners
-/docs/fine-tuning/ @Liquid4All/fine-tuning-team
+/customization/ @Liquid4All/fine-tuning-team
 
-/docs/inference @Liquid4All/inference-team
-/docs/inference/*-deployment.mdx @tuliren
+/deployment/gpu-inference/ @Liquid4All/inference-team
+/deployment/gpu-inference/baseten.mdx @tuliren
+/deployment/gpu-inference/fal.mdx @tuliren
+/deployment/gpu-inference/modal.mdx @tuliren
+/deployment/on-device/ @Liquid4All/inference-team
 
-/docs/key-concepts/ @mlabonne
-/docs/models/audio-models.mdx @haerski
-/docs/models/vision-models.mdx @ankke
-/docs/models/ @mlabonne
+/lfm/key-concepts/ @mlabonne
+/lfm/models/audio-models.mdx @haerski
+/lfm/models/vision-models.mdx @ankke
+/lfm/models/ @mlabonne
 
-/leap/ @dbhathena
-/leap/edge-sdk/ @iamstuffed
-/leap/leap-bundle/ @tuliren
-/leap/finetuning.mdx @Liquid4All/fine-tuning-team
+/deployment/on-device/ios/ @iamstuffed
+/deployment/on-device/android/ @iamstuffed
+/deployment/tools/model-bundling/ @tuliren
 
 /.github/workflows/ @tuliren
diff --git a/docs/fine-tuning/datasets.mdx b/customization/finetuning-frameworks/datasets.mdx
similarity index 100%
rename from docs/fine-tuning/datasets.mdx
rename to customization/finetuning-frameworks/datasets.mdx
diff --git a/docs/fine-tuning/leap-finetune.mdx b/customization/finetuning-frameworks/leap-finetune.mdx
similarity index 79%
rename from docs/fine-tuning/leap-finetune.mdx
rename to customization/finetuning-frameworks/leap-finetune.mdx
index d25add6..03bcfb3 100644
--- a/docs/fine-tuning/leap-finetune.mdx
+++ b/customization/finetuning-frameworks/leap-finetune.mdx
@@ -20,10 +20,10 @@ LEAP Finetune will provide:
 While LEAP Finetune is in development, you can fine-tune models using:
 
 <CardGroup cols={2}>
-  <Card title="TRL" icon="graduation-cap" href="/docs/fine-tuning/trl">
+  <Card title="TRL" icon="graduation-cap" href="/customization/finetuning-frameworks/trl">
     Hugging Face's training library with LoRA/QLoRA support
   </Card>
-  <Card title="Unsloth" icon="zap" href="/docs/fine-tuning/unsloth">
+  <Card title="Unsloth" icon="zap" href="/customization/finetuning-frameworks/unsloth">
     Memory-efficient fine-tuning with 2x faster training
   </Card>
 </CardGroup>
@@ -33,8 +33,8 @@ While LEAP Finetune is in development, you can fine-tune models using:
 After fine-tuning with TRL or Unsloth, prepare your model for edge deployment:
 
 1. **Fine-tune** your model using TRL or Unsloth
-2. **Convert** to edge-optimized format using the [Model Bundling Service](/leap/leap-bundle/quick-start)
-3. **Deploy** to mobile devices using the [LEAP SDK](/leap/edge-sdk/overview)
+2. **Convert** to edge-optimized format using the [Model Bundling Service](/deployment/tools/model-bundling/quick-start)
+3. **Deploy** to mobile devices using the [LEAP SDK](/deployment/on-device/ios/ios-quick-start-guide)
 
 ```bash
 # Example: Bundle a fine-tuned model for edge deployment
diff --git a/docs/fine-tuning/trl.mdx b/customization/finetuning-frameworks/trl.mdx
similarity index 96%
rename from docs/fine-tuning/trl.mdx
rename to customization/finetuning-frameworks/trl.mdx
index c63ac28..00cbea2 100644
--- a/docs/fine-tuning/trl.mdx
+++ b/customization/finetuning-frameworks/trl.mdx
@@ -9,7 +9,7 @@ description: "TRL (Transformer Reinforcement Learning) is a library for fine-tun
 
 LFM models work out-of-the-box with TRL without requiring any custom integration.
 
-Different training methods require specific dataset formats. See [Finetuning Datasets](/docs/fine-tuning/datasets) for format requirements.
+Different training methods require specific dataset formats. See [Finetuning Datasets](/customization/finetuning-frameworks/datasets) for format requirements.
 
 ## Installation[​](#installation "Direct link to Installation")
 
@@ -27,7 +27,7 @@ pip install trl>=0.9.0 transformers>=4.55.0 torch>=2.6 peft accelerate
 
 [![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/💧_LFM2_5_SFT_with_TRL.ipynb)
 
-The `SFTTrainer` makes it easy to fine-tune LFM models on instruction-following or conversational datasets. It handles chat templates, packing, and dataset formatting automatically. SFT training requires [Instruction datasets](/docs/fine-tuning/datasets#instruction-datasets-sft).
+The `SFTTrainer` makes it easy to fine-tune LFM models on instruction-following or conversational datasets. It handles chat templates, packing, and dataset formatting automatically. SFT training requires [Instruction datasets](/customization/finetuning-frameworks/datasets#instruction-datasets-sft).
 
 ### LoRA Fine-Tuning (Recommended)[​](#lora-fine-tuning-recommended "Direct link to LoRA Fine-Tuning (Recommended)")
 
@@ -132,7 +132,7 @@ trainer.train()
 
 [![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/💧_LFM2_5_VL_SFT_with_TRL.ipynb)
 
-The `SFTTrainer` also supports fine-tuning Vision Language Models like `LFM2.5-VL-1.6B` on image-text datasets. VLM fine-tuning requires [Vision datasets](/docs/fine-tuning/datasets#vision-datasets-vlm-sft) and a few key differences from text-only SFT:
+The `SFTTrainer` also supports fine-tuning Vision Language Models like `LFM2.5-VL-1.6B` on image-text datasets. VLM fine-tuning requires [Vision datasets](/customization/finetuning-frameworks/datasets#vision-datasets-vlm-sft) and a few key differences from text-only SFT:
 
 * Uses `AutoModelForImageTextToText` instead of `AutoModelForCausalLM`
 * Uses `AutoProcessor` instead of just a tokenizer
@@ -290,7 +290,7 @@ trainer.train()
 
 [![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/💧_LFM2_DPO_with_TRL.ipynb)
 
-The `DPOTrainer` implements Direct Preference Optimization, a method to align models with human preferences without requiring a separate reward model. DPO training requires [Preference datasets](/docs/fine-tuning/datasets#preference-datasets-dpo) with chosen and rejected response pairs.
+The `DPOTrainer` implements Direct Preference Optimization, a method to align models with human preferences without requiring a separate reward model. DPO training requires [Preference datasets](/customization/finetuning-frameworks/datasets#preference-datasets-dpo) with chosen and rejected response pairs.
 
 ### DPO with LoRA (Recommended)[​](#dpo-with-lora-recommended "Direct link to DPO with LoRA (Recommended)")
 
diff --git a/docs/fine-tuning/unsloth.mdx b/customization/finetuning-frameworks/unsloth.mdx
similarity index 88%
rename from docs/fine-tuning/unsloth.mdx
rename to customization/finetuning-frameworks/unsloth.mdx
index cdf1548..32568f8 100644
--- a/docs/fine-tuning/unsloth.mdx
+++ b/customization/finetuning-frameworks/unsloth.mdx
@@ -7,9 +7,9 @@ description: "Unsloth makes fine-tuning LLMs 2-5x faster with 70% less memory th
   Use Unsloth for faster training with optimized kernels, reduced memory usage, and built-in quantization support.
 </Tip>
 
-LFM2.5 models are fully supported by Unsloth. For comprehensive guides and tutorials, see the [official Unsloth LFM2.5 documentation](https://unsloth.ai/docs/models/tutorials/lfm2.5).
+LFM2.5 models are fully supported by Unsloth. For comprehensive guides and tutorials, see the [official Unsloth LFM2.5 documentation](https://unsloth.ai/lfm/models/tutorials/lfm2.5).
 
-Different training methods require specific dataset formats. See [Finetuning Datasets](/docs/fine-tuning/datasets) for format requirements for [SFT](/docs/fine-tuning/datasets#instruction-datasets-sft) and [GRPO](/docs/fine-tuning/datasets#prompt-only-datasets-grpo).
+Different training methods require specific dataset formats. See [Finetuning Datasets](/customization/finetuning-frameworks/datasets) for format requirements for [SFT](/customization/finetuning-frameworks/datasets#instruction-datasets-sft) and [GRPO](/customization/finetuning-frameworks/datasets#prompt-only-datasets-grpo).
 
 ## Notebooks
 
@@ -85,5 +85,5 @@ FastLanguageModel.for_inference(model)
 ## Resources
 
 * [Unsloth Documentation](https://unsloth.ai/docs)
-* [Unsloth LFM2.5 Tutorial](https://unsloth.ai/docs/models/tutorials/lfm2.5)
+* [Unsloth LFM2.5 Tutorial](https://unsloth.ai/lfm/models/tutorials/lfm2.5)
 * [Liquid AI Cookbook](https://github.com/Liquid4All/cookbook)
diff --git a/customization/getting-started/connect-ai-tools.mdx b/customization/getting-started/connect-ai-tools.mdx
new file mode 100644
index 0000000..69ee17b
--- /dev/null
+++ b/customization/getting-started/connect-ai-tools.mdx
@@ -0,0 +1,8 @@
+---
+title: "Connect AI Tools"
+description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation"
+---
+
+import ConnectAiTools from "/snippets/connect-ai-tools.mdx";
+
+<ConnectAiTools></ConnectAiTools>
diff --git a/customization/getting-started/welcome.mdx b/customization/getting-started/welcome.mdx
new file mode 100644
index 0000000..4450634
--- /dev/null
+++ b/customization/getting-started/welcome.mdx
@@ -0,0 +1,23 @@
+---
+title: "Customization Options"
+description: "Fine-tune and customize Liquid Foundation Models for your specific use cases."
+---
+
+LFM models support fine-tuning with popular frameworks and tools. Whether you need to adapt models for domain-specific tasks, improve accuracy on your data, or optimize for production workflows, these guides will help you get started.
+
+## Get Started
+
+<CardGroup cols={2}>
+  <Card title="Workbench" icon="wrench" href="/customization/tools/workbench">
+    Evaluate and iterate on prompts with Liquid's no-code Workbench tool
+  </Card>
+  <Card title="Finetuning Datasets" icon="database" href="/customization/finetuning-frameworks/datasets">
+    Prepare datasets in the right format for SFT, DPO, and GRPO training
+  </Card>
+  <Card title="TRL" icon="sliders" href="/customization/finetuning-frameworks/trl">
+    Fine-tune LFM models using Hugging Face's TRL library
+  </Card>
+  <Card title="Unsloth" icon="bolt" href="/customization/finetuning-frameworks/unsloth">
+    Fast and memory-efficient fine-tuning with Unsloth
+  </Card>
+</CardGroup>
diff --git a/docs/fine-tuning/workbench.mdx b/customization/tools/workbench.mdx
similarity index 100%
rename from docs/fine-tuning/workbench.mdx
rename to customization/tools/workbench.mdx
diff --git a/deployment/getting-started/connect-ai-tools.mdx b/deployment/getting-started/connect-ai-tools.mdx
new file mode 100644
index 0000000..69ee17b
--- /dev/null
+++ b/deployment/getting-started/connect-ai-tools.mdx
@@ -0,0 +1,8 @@
+---
+title: "Connect AI Tools"
+description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation"
+---
+
+import ConnectAiTools from "/snippets/connect-ai-tools.mdx";
+
+<ConnectAiTools></ConnectAiTools>
diff --git a/deployment/getting-started/welcome.mdx b/deployment/getting-started/welcome.mdx
new file mode 100644
index 0000000..8ea7c67
--- /dev/null
+++ b/deployment/getting-started/welcome.mdx
@@ -0,0 +1,60 @@
+---
+title: "Deployment Options"
+description: "Deploy Liquid Foundation Models on any platform — from mobile devices to GPU clusters."
+---
+
+LFM models are designed for efficient deployment across a wide range of platforms. Run models on-device for privacy and low latency, or scale up with GPU inference for production workloads.
+
+## On-Device
+
+<CardGroup cols={3}>
+  <Card title="iOS SDK" icon="apple" href="/deployment/on-device/ios/ios-quick-start-guide">
+    Deploy models natively on iPhone and iPad
+  </Card>
+  <Card title="Android SDK" icon="robot" href="/deployment/on-device/android/android-quick-start-guide">
+    Deploy models natively on Android devices
+  </Card>
+  <Card title="llama.cpp" icon="terminal" href="/deployment/on-device/llama-cpp">
+    CPU-first inference with cross-platform support
+  </Card>
+  <Card title="MLX" icon="microchip" href="/deployment/on-device/mlx">
+    Optimized inference on Apple Silicon
+  </Card>
+  <Card title="ONNX" icon="cube" href="/deployment/on-device/onnx">
+    Cross-platform inference with ONNX Runtime
+  </Card>
+  <Card title="Ollama" icon="download" href="/deployment/on-device/ollama">
+    Easy local deployment and model management
+  </Card>
+</CardGroup>
+
+## GPU Inference
+
+<CardGroup cols={3}>
+  <Card title="Transformers" icon="code" href="/deployment/gpu-inference/transformers">
+    Flexible inference with Hugging Face Transformers
+  </Card>
+  <Card title="vLLM" icon="bolt" href="/deployment/gpu-inference/vllm">
+    High-throughput production serving
+  </Card>
+  <Card title="SGLang" icon="server" href="/deployment/gpu-inference/sglang">
+    Structured generation and fast serving
+  </Card>
+  <Card title="Modal" icon="cloud" href="/deployment/gpu-inference/modal">
+    Serverless GPU deployment
+  </Card>
+  <Card title="Baseten" icon="cloud" href="/deployment/gpu-inference/baseten">
+    Production model inference platform
+  </Card>
+  <Card title="Fal" icon="cloud" href="/deployment/gpu-inference/fal">
+    Fast inference API platform
+  </Card>
+</CardGroup>
+
+## Tools
+
+<CardGroup cols={1}>
+  <Card title="Model Bundling Services" icon="box" href="/deployment/tools/model-bundling/quick-start">
+    Package and distribute optimized model bundles for edge deployment
+  </Card>
+</CardGroup>
diff --git a/docs/inference/baseten-deployment.mdx b/deployment/gpu-inference/baseten.mdx
similarity index 100%
rename from docs/inference/baseten-deployment.mdx
rename to deployment/gpu-inference/baseten.mdx
diff --git a/docs/inference/fal-deployment.mdx b/deployment/gpu-inference/fal.mdx
similarity index 100%
rename from docs/inference/fal-deployment.mdx
rename to deployment/gpu-inference/fal.mdx
diff --git a/docs/inference/modal-deployment.mdx b/deployment/gpu-inference/modal.mdx
similarity index 100%
rename from docs/inference/modal-deployment.mdx
rename to deployment/gpu-inference/modal.mdx
diff --git a/docs/inference/sglang.mdx b/deployment/gpu-inference/sglang.mdx
similarity index 95%
rename from docs/inference/sglang.mdx
rename to deployment/gpu-inference/sglang.mdx
index 78e8d04..392eadf 100644
--- a/docs/inference/sglang.mdx
+++ b/deployment/gpu-inference/sglang.mdx
@@ -7,7 +7,7 @@ description: "SGLang is a fast serving framework for large language models. It f
   Use SGLang for ultra-low latency, high-throughput production serving with many concurrent requests.
 </Tip>
 
-SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/docs/inference/llama-cpp) instead.
+SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/deployment/on-device/llama-cpp) instead.
 
 ## Supported Models
 
@@ -18,7 +18,7 @@ SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using
 | Vision models | Not yet supported | LFM2-VL |
 
 <Note>
-MoE model support has been merged into SGLang but is not yet included in a stable release — [install from main](#install-from-main-moe-support) to use MoE models now. Vision models are not yet supported in SGLang — use [Transformers](/docs/inference/transformers) for vision workloads.
+MoE model support has been merged into SGLang but is not yet included in a stable release — [install from main](#install-from-main-moe-support) to use MoE models now. Vision models are not yet supported in SGLang — use [Transformers](/deployment/gpu-inference/transformers) for vision workloads.
 </Note>
 
 ## Installation
@@ -119,7 +119,7 @@ response = client.chat.completions.create(
 print(response.choices[0].message)
 ```
 
-For more details on tool use with LFM models, see [Tool Use](/docs/key-concepts/tool-use).
+For more details on tool use with LFM models, see [Tool Use](/lfm/key-concepts/tool-use).
 
 <Accordion title="Curl request example">
   ```bash
diff --git a/docs/inference/transformers.mdx b/deployment/gpu-inference/transformers.mdx
similarity index 99%
rename from docs/inference/transformers.mdx
rename to deployment/gpu-inference/transformers.mdx
index bfffc43..7c4f6b3 100644
--- a/docs/inference/transformers.mdx
+++ b/deployment/gpu-inference/transformers.mdx
@@ -7,7 +7,7 @@ description: "Transformers is a library for inference and training of pretrained
   Use Transformers for simple inference without extra dependencies, research and experimentation, or integration with the Hugging Face ecosystem.
 </Tip>
 
-Transformers provides the most flexibility for model development and is ideal for users who want direct access to model internals. For production deployments with high throughput, consider using [vLLM](/docs/inference/vllm).
+Transformers provides the most flexibility for model development and is ideal for users who want direct access to model internals. For production deployments with high throughput, consider using [vLLM](/deployment/gpu-inference/vllm).
 
 <div className="colab-link">
   <a href="https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/LFM2_Inference_with_Transformers.ipynb" target="_blank">
@@ -165,7 +165,7 @@ output = model.generate(input_ids, streamer=streamer, max_new_tokens=512)
 Process multiple prompts in a single batch for efficiency. See the [batching documentation](https://huggingface.co/docs/transformers/en/main_classes/text_generation#batch-generation) for more details:
 
 <Note>
-  Batching is not automatically a win for performance. For high-performance batching with optimized throughput, consider using [vLLM](/docs/inference/vllm).
+  Batching is not automatically a win for performance. For high-performance batching with optimized throughput, consider using [vLLM](/deployment/gpu-inference/vllm).
 </Note>
 
 ```python
diff --git a/docs/inference/vllm.mdx b/deployment/gpu-inference/vllm.mdx
similarity index 97%
rename from docs/inference/vllm.mdx
rename to deployment/gpu-inference/vllm.mdx
index aab625d..1d5ff84 100644
--- a/docs/inference/vllm.mdx
+++ b/deployment/gpu-inference/vllm.mdx
@@ -7,7 +7,7 @@ description: "vLLM is a high-throughput and memory-efficient inference engine fo
   Use vLLM for high-throughput production deployments, batch processing, or serving models via an API.
 </Tip>
 
-vLLM offers significantly higher throughput than [Transformers](/docs/inference/transformers), making it ideal for serving many concurrent requests. However, it requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/docs/inference/llama-cpp) instead.
+vLLM offers significantly higher throughput than [Transformers](/deployment/gpu-inference/transformers), making it ideal for serving many concurrent requests. However, it requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/deployment/on-device/llama-cpp) instead.
 
 <div className="colab-link">
   <a href="https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/LFM2_Inference_with_vLLM.ipynb" target="_blank">
diff --git a/leap/edge-sdk/android/advanced-features.mdx b/deployment/on-device/android/advanced-features.mdx
similarity index 100%
rename from leap/edge-sdk/android/advanced-features.mdx
rename to deployment/on-device/android/advanced-features.mdx
diff --git a/leap/edge-sdk/android/ai-agent-usage-guide.mdx b/deployment/on-device/android/ai-agent-usage-guide.mdx
similarity index 100%
rename from leap/edge-sdk/android/ai-agent-usage-guide.mdx
rename to deployment/on-device/android/ai-agent-usage-guide.mdx
diff --git a/leap/edge-sdk/android/android-quick-start-guide.mdx b/deployment/on-device/android/android-quick-start-guide.mdx
similarity index 99%
rename from leap/edge-sdk/android/android-quick-start-guide.mdx
rename to deployment/on-device/android/android-quick-start-guide.mdx
index 686c218..36a9b1d 100644
--- a/leap/edge-sdk/android/android-quick-start-guide.mdx
+++ b/deployment/on-device/android/android-quick-start-guide.mdx
@@ -450,4 +450,4 @@ In this pattern:
 
 See [LeapSDK-Examples](https://github.com/Liquid4All/LeapSDK-Examples) for complete example apps using LeapSDK.
 
-[Edit this page](https://github.com/Liquid4All/docs/tree/main/leap/edge-sdk/android/android-quick-start-guide.mdx)
+[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/android/android-quick-start-guide.mdx)
diff --git a/leap/edge-sdk/android/cloud-ai-comparison.mdx b/deployment/on-device/android/cloud-ai-comparison.mdx
similarity index 100%
rename from leap/edge-sdk/android/cloud-ai-comparison.mdx
rename to deployment/on-device/android/cloud-ai-comparison.mdx
diff --git a/leap/edge-sdk/android/constrained-generation.mdx b/deployment/on-device/android/constrained-generation.mdx
similarity index 100%
rename from leap/edge-sdk/android/constrained-generation.mdx
rename to deployment/on-device/android/constrained-generation.mdx
diff --git a/leap/edge-sdk/android/conversation-generation.mdx b/deployment/on-device/android/conversation-generation.mdx
similarity index 100%
rename from leap/edge-sdk/android/conversation-generation.mdx
rename to deployment/on-device/android/conversation-generation.mdx
diff --git a/leap/edge-sdk/android/function-calling.mdx b/deployment/on-device/android/function-calling.mdx
similarity index 100%
rename from leap/edge-sdk/android/function-calling.mdx
rename to deployment/on-device/android/function-calling.mdx
diff --git a/leap/edge-sdk/android/messages-content.mdx b/deployment/on-device/android/messages-content.mdx
similarity index 100%
rename from leap/edge-sdk/android/messages-content.mdx
rename to deployment/on-device/android/messages-content.mdx
diff --git a/leap/edge-sdk/android/model-loading.mdx b/deployment/on-device/android/model-loading.mdx
similarity index 100%
rename from leap/edge-sdk/android/model-loading.mdx
rename to deployment/on-device/android/model-loading.mdx
diff --git a/leap/edge-sdk/android/utilities.mdx b/deployment/on-device/android/utilities.mdx
similarity index 100%
rename from leap/edge-sdk/android/utilities.mdx
rename to deployment/on-device/android/utilities.mdx
diff --git a/leap/edge-sdk/ios/advanced-features.mdx b/deployment/on-device/ios/advanced-features.mdx
similarity index 100%
rename from leap/edge-sdk/ios/advanced-features.mdx
rename to deployment/on-device/ios/advanced-features.mdx
diff --git a/leap/edge-sdk/ios/ai-agent-usage-guide.mdx b/deployment/on-device/ios/ai-agent-usage-guide.mdx
similarity index 100%
rename from leap/edge-sdk/ios/ai-agent-usage-guide.mdx
rename to deployment/on-device/ios/ai-agent-usage-guide.mdx
diff --git a/leap/edge-sdk/ios/cloud-ai-comparison.mdx b/deployment/on-device/ios/cloud-ai-comparison.mdx
similarity index 100%
rename from leap/edge-sdk/ios/cloud-ai-comparison.mdx
rename to deployment/on-device/ios/cloud-ai-comparison.mdx
diff --git a/leap/edge-sdk/ios/constrained-generation.mdx b/deployment/on-device/ios/constrained-generation.mdx
similarity index 100%
rename from leap/edge-sdk/ios/constrained-generation.mdx
rename to deployment/on-device/ios/constrained-generation.mdx
diff --git a/leap/edge-sdk/ios/conversation-generation.mdx b/deployment/on-device/ios/conversation-generation.mdx
similarity index 100%
rename from leap/edge-sdk/ios/conversation-generation.mdx
rename to deployment/on-device/ios/conversation-generation.mdx
diff --git a/leap/edge-sdk/ios/function-calling.mdx b/deployment/on-device/ios/function-calling.mdx
similarity index 100%
rename from leap/edge-sdk/ios/function-calling.mdx
rename to deployment/on-device/ios/function-calling.mdx
diff --git a/leap/edge-sdk/ios/ios-quick-start-guide.mdx b/deployment/on-device/ios/ios-quick-start-guide.mdx
similarity index 97%
rename from leap/edge-sdk/ios/ios-quick-start-guide.mdx
rename to deployment/on-device/ios/ios-quick-start-guide.mdx
index a580628..d7d9a28 100644
--- a/leap/edge-sdk/ios/ios-quick-start-guide.mdx
+++ b/deployment/on-device/ios/ios-quick-start-guide.mdx
@@ -323,10 +323,10 @@ conversation = current.modelRunner.createConversationFromHistory(
 
 ## Next steps[​](#next-steps "Direct link to Next steps")
 
-* Learn how to expose structured JSON outputs with the [`@Generatable` macros](/leap/edge-sdk/ios/constrained-generation).
-* Wire up tools and external APIs with [Function Calling](/leap/edge-sdk/ios/function-calling).
-* Compare on-device and cloud behaviour in [Cloud AI Comparison](/leap/edge-sdk/ios/cloud-ai-comparison).
+* Learn how to expose structured JSON outputs with the [`@Generatable` macros](/deployment/on-device/ios/constrained-generation).
+* Wire up tools and external APIs with [Function Calling](/deployment/on-device/ios/function-calling).
+* Compare on-device and cloud behaviour in [Cloud AI Comparison](/deployment/on-device/ios/cloud-ai-comparison).
 
 You now have a project that loads an on-device model, streams responses, and is ready for advanced features like structured output and tool use.
 
-[Edit this page](https://github.com/Liquid4All/docs/tree/main/leap/edge-sdk/ios/ios-quick-start-guide.mdx)
+[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/ios/ios-quick-start-guide.mdx)
diff --git a/leap/edge-sdk/ios/messages-content.mdx b/deployment/on-device/ios/messages-content.mdx
similarity index 100%
rename from leap/edge-sdk/ios/messages-content.mdx
rename to deployment/on-device/ios/messages-content.mdx
diff --git a/leap/edge-sdk/ios/model-loading.mdx b/deployment/on-device/ios/model-loading.mdx
similarity index 100%
rename from leap/edge-sdk/ios/model-loading.mdx
rename to deployment/on-device/ios/model-loading.mdx
diff --git a/leap/edge-sdk/ios/utilities.mdx b/deployment/on-device/ios/utilities.mdx
similarity index 100%
rename from leap/edge-sdk/ios/utilities.mdx
rename to deployment/on-device/ios/utilities.mdx
diff --git a/docs/inference/llama-cpp.mdx b/deployment/on-device/llama-cpp.mdx
similarity index 99%
rename from docs/inference/llama-cpp.mdx
rename to deployment/on-device/llama-cpp.mdx
index 4c240eb..6612222 100644
--- a/docs/inference/llama-cpp.mdx
+++ b/deployment/on-device/llama-cpp.mdx
@@ -7,7 +7,7 @@ description: "llama.cpp is a C++ library for efficient LLM inference with minima
   Use llama.cpp for CPU-only environments, local development, or edge deployment and on-device inference.
 </Tip>
 
-For GPU-accelerated inference at scale, consider using [vLLM](/docs/inference/vllm) instead.
+For GPU-accelerated inference at scale, consider using [vLLM](/deployment/gpu-inference/vllm) instead.
 
 <div className="colab-link">
   <a href="https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/LFM2_Inference_with_llama_cpp.ipynb" target="_blank">
@@ -100,7 +100,7 @@ For GPU-accelerated inference at scale, consider using [vLLM](/docs/inference/vl
 
 ## Downloading GGUF Models
 
-llama.cpp uses the GGUF format, which stores quantized model weights for efficient inference. All LFM models are available in GGUF format on Hugging Face. See the [Models page](/docs/models/complete-library) for all available GGUF models.
+llama.cpp uses the GGUF format, which stores quantized model weights for efficient inference. All LFM models are available in GGUF format on Hugging Face. See the [Models page](/lfm/models/complete-library) for all available GGUF models.
 
 You can download LFM models in GGUF format from Hugging Face as follows:
 
diff --git a/docs/inference/lm-studio.mdx b/deployment/on-device/lm-studio.mdx
similarity index 98%
rename from docs/inference/lm-studio.mdx
rename to deployment/on-device/lm-studio.mdx
index 19ea48a..e961050 100644
--- a/docs/inference/lm-studio.mdx
+++ b/deployment/on-device/lm-studio.mdx
@@ -18,7 +18,7 @@ Download and install LM Studio directly from [lmstudio.ai](https://lmstudio.ai/d
 3. Select a model and quantization level (`Q4_K_M` recommended)
 4. Click **Download**
 
-See the [Models page](/docs/models/complete-library) for all available GGUF models.
+See the [Models page](/lfm/models/complete-library) for all available GGUF models.
 
 ## Using the Chat Interface
 
diff --git a/docs/inference/mlx.mdx b/deployment/on-device/mlx.mdx
similarity index 95%
rename from docs/inference/mlx.mdx
rename to deployment/on-device/mlx.mdx
index 199f001..35b0bfa 100644
--- a/docs/inference/mlx.mdx
+++ b/deployment/on-device/mlx.mdx
@@ -21,7 +21,7 @@ pip install mlx-lm
 
 The `mlx-lm` package provides a simple interface for text generation with MLX models.
 
-See the [Models page](/docs/models/complete-library) for all available MLX models, or browse MLX community models at [mlx-community LFM2 models](https://huggingface.co/models?sort=created&search=mlx-communityLFM2).
+See the [Models page](/lfm/models/complete-library) for all available MLX models, or browse MLX community models at [mlx-community LFM2 models](https://huggingface.co/models?sort=created&search=mlx-communityLFM2).
 
 ```python
 from mlx_lm import load, generate
diff --git a/docs/inference/ollama.mdx b/deployment/on-device/ollama.mdx
similarity index 98%
rename from docs/inference/ollama.mdx
rename to deployment/on-device/ollama.mdx
index a149954..65939f7 100644
--- a/docs/inference/ollama.mdx
+++ b/deployment/on-device/ollama.mdx
@@ -68,7 +68,7 @@ You can run LFM2 models directly from Hugging Face:
 ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
 ```
 
-See the [Models page](/docs/models/complete-library) for all available GGUF repositories.
+See the [Models page](/lfm/models/complete-library) for all available GGUF repositories.
 
 To use a local GGUF file, first download a model from Hugging Face:
 
diff --git a/docs/inference/onnx.mdx b/deployment/on-device/onnx.mdx
similarity index 98%
rename from docs/inference/onnx.mdx
rename to deployment/on-device/onnx.mdx
index 7d363a9..9cf5cf7 100644
--- a/docs/inference/onnx.mdx
+++ b/deployment/on-device/onnx.mdx
@@ -72,7 +72,7 @@ For complete documentation and advanced options, see the [LiquidONNX GitHub repo
 
 ## Pre-exported Models
 
-Many LFM models are available as pre-exported ONNX packages from [LiquidAI](https://huggingface.co/LiquidAI/models?search=onnx) and the [onnx-community](https://huggingface.co/onnx-community). Check the [Model Library](/docs/models/complete-library) for a complete list of available formats.
+Many LFM models are available as pre-exported ONNX packages from [LiquidAI](https://huggingface.co/LiquidAI/models?search=onnx) and the [onnx-community](https://huggingface.co/onnx-community). Check the [Model Library](/lfm/models/complete-library) for a complete list of available formats.
 
 ### Quantization Options
 
diff --git a/leap/leap-bundle/authentication.mdx b/deployment/tools/model-bundling/authentication.mdx
similarity index 100%
rename from leap/leap-bundle/authentication.mdx
rename to deployment/tools/model-bundling/authentication.mdx
diff --git a/leap/leap-bundle/bundle-creation.mdx b/deployment/tools/model-bundling/bundle-creation.mdx
similarity index 100%
rename from leap/leap-bundle/bundle-creation.mdx
rename to deployment/tools/model-bundling/bundle-creation.mdx
diff --git a/leap/leap-bundle/bundle-management.mdx b/deployment/tools/model-bundling/bundle-management.mdx
similarity index 100%
rename from leap/leap-bundle/bundle-management.mdx
rename to deployment/tools/model-bundling/bundle-management.mdx
diff --git a/leap/leap-bundle/changelog.mdx b/deployment/tools/model-bundling/changelog.mdx
similarity index 100%
rename from leap/leap-bundle/changelog.mdx
rename to deployment/tools/model-bundling/changelog.mdx
diff --git a/leap/leap-bundle/configuration.mdx b/deployment/tools/model-bundling/configuration.mdx
similarity index 100%
rename from leap/leap-bundle/configuration.mdx
rename to deployment/tools/model-bundling/configuration.mdx
diff --git a/leap/leap-bundle/data-privacy.mdx b/deployment/tools/model-bundling/data-privacy.mdx
similarity index 100%
rename from leap/leap-bundle/data-privacy.mdx
rename to deployment/tools/model-bundling/data-privacy.mdx
diff --git a/leap/leap-bundle/download.mdx b/deployment/tools/model-bundling/download.mdx
similarity index 100%
rename from leap/leap-bundle/download.mdx
rename to deployment/tools/model-bundling/download.mdx
diff --git a/leap/leap-bundle/quick-start.mdx b/deployment/tools/model-bundling/quick-start.mdx
similarity index 96%
rename from leap/leap-bundle/quick-start.mdx
rename to deployment/tools/model-bundling/quick-start.mdx
index f9a71a4..4710156 100644
--- a/leap/leap-bundle/quick-start.mdx
+++ b/deployment/tools/model-bundling/quick-start.mdx
@@ -69,7 +69,7 @@ If model uploads fail with connectivity errors, verify that your network allows
 
     3. Select the [`API keys` tab](https://leap.liquid.ai/profile#/api-keys) and create a new API key
 
-    ![api-key-screenshot](/images/leap/leap-bundle/assets/images/api-keys-51242efd637d71dd5e7f4eb01555cd78.png)
+    ![api-key-screenshot](/images/deployment/tools/model-bundling/assets/images/api-keys-51242efd637d71dd5e7f4eb01555cd78.png)
 
     4. Authenticate the Model Bundling Service with your API token:
 
@@ -228,4 +228,4 @@ If model uploads fail with connectivity errors, verify that your network allows
 ## Next Steps
 
 * Visit the [LEAP Model Library](https://leap.liquid.ai/models) to explore available models.
-* Check the [Bundle Creation](/leap/leap-bundle/bundle-creation) page for detailed command reference.
+* Check the [Bundle Creation](/deployment/tools/model-bundling/bundle-creation) page for detailed command reference.
diff --git a/leap/leap-bundle/reference.mdx b/deployment/tools/model-bundling/reference.mdx
similarity index 100%
rename from leap/leap-bundle/reference.mdx
rename to deployment/tools/model-bundling/reference.mdx
diff --git a/docs.json b/docs.json
index 67a08f3..71c6ad4 100644
--- a/docs.json
+++ b/docs.json
@@ -1,7 +1,7 @@
 {
   "$schema": "https://mintlify.com/docs.json",
   "banner": {
-    "content": "🚀 New: LFM2.5-1.2B-Instruct and LFM2.5-1.2B-Thinking are now available! [Learn more →](/docs/models/text-models)",
+    "content": "🚀 New: LFM2.5-1.2B-Instruct and LFM2.5-1.2B-Thinking are now available! [Learn more →](/lfm/models/text-models)",
     "dismissible": true
   },
   "theme": "mint",
@@ -36,7 +36,7 @@
   "logo": {
     "light": "/logo/light.svg",
     "dark": "/logo/dark.svg",
-    "href": "/docs/getting-started/welcome"
+    "href": "/lfm/getting-started/welcome"
   },
   "navbar": {
     "links": [
@@ -54,136 +54,161 @@
   "navigation": {
     "tabs": [
       {
-        "tab": "Documentation",
+        "tab": "LFM",
         "groups": [
           {
-            "group": "Get Started",
+            "group": "Getting Started",
             "icon": "rocket",
             "pages": [
-              "docs/getting-started/welcome",
-              "docs/getting-started/connect-ai-tools"
+              "lfm/getting-started/welcome",
+              "lfm/getting-started/connect-ai-tools"
             ]
           },
           {
             "group": "Models",
             "icon": "brain",
             "pages": [
-              "docs/models/complete-library",
-              "docs/models/text-models",
-              "docs/models/vision-models",
-              "docs/models/audio-models",
-              "docs/models/liquid-nanos"
+              "lfm/models/complete-library",
+              "lfm/models/text-models",
+              "lfm/models/vision-models",
+              "lfm/models/audio-models",
+              "lfm/models/liquid-nanos"
             ]
           },
           {
             "group": "Key Concepts",
             "icon": "lightbulb",
             "pages": [
-              "docs/key-concepts/chat-template",
-              "docs/key-concepts/text-generation-and-prompting",
-              "docs/key-concepts/tool-use"
+              "lfm/key-concepts/chat-template",
+              "lfm/key-concepts/text-generation-and-prompting",
+              "lfm/key-concepts/tool-use"
             ]
           },
           {
-            "group": "Inference",
-            "icon": "play",
+            "group": "Help",
+            "icon": "book",
             "pages": [
-              "docs/inference/transformers",
-              "docs/inference/llama-cpp",
-              "docs/inference/vllm",
-              "docs/inference/sglang",
-              "docs/inference/mlx",
-              "docs/inference/ollama",
-              "docs/inference/onnx",
-              {
-                "group": "Other Frameworks",
-                "icon": "server",
-                "pages": [
-                  "docs/inference/lm-studio",
-                  "docs/inference/modal-deployment",
-                  "docs/inference/baseten-deployment",
-                  "docs/inference/fal-deployment"
-                ]
-              }
+              "lfm/help/faqs",
+              "lfm/help/troubleshooting",
+              "lfm/help/contributing"
+            ]
+          }
+        ]
+      },
+      {
+        "tab": "Customization",
+        "groups": [
+          {
+            "group": "Getting Started",
+            "icon": "rocket",
+            "pages": [
+              "customization/getting-started/welcome",
+              "customization/getting-started/connect-ai-tools"
             ]
           },
           {
-            "group": "Fine-tuning",
-            "icon": "sliders",
+            "group": "Tools",
+            "icon": "wrench",
             "pages": [
-              "docs/fine-tuning/workbench",
-              "docs/fine-tuning/datasets",
-              "docs/fine-tuning/trl",
-              "docs/fine-tuning/unsloth"
+              "customization/tools/workbench"
             ]
           },
           {
-            "group": "Help",
-            "icon": "book",
+            "group": "Finetuning Frameworks",
+            "icon": "sliders",
             "pages": [
-              "docs/help/faqs",
-              "docs/help/troubleshooting",
-              "docs/help/contributing"
+              "customization/finetuning-frameworks/datasets",
+              "customization/finetuning-frameworks/trl",
+              "customization/finetuning-frameworks/unsloth"
             ]
           }
         ]
       },
       {
-        "tab": "SDK Reference",
+        "tab": "Deployment",
         "groups": [
           {
-            "group": "Get Started",
+            "group": "Getting Started",
             "icon": "rocket",
             "pages": [
-              "leap/edge-sdk/overview",
-              "docs/getting-started/connect-ai-tools"
+              "deployment/getting-started/welcome",
+              "deployment/getting-started/connect-ai-tools"
             ]
           },
           {
-            "group": "iOS",
-            "icon": "apple",
+            "group": "On-Device",
+            "icon": "mobile",
             "pages": [
-              "leap/edge-sdk/ios/ios-quick-start-guide",
-              "leap/edge-sdk/ios/ai-agent-usage-guide",
-              "leap/edge-sdk/ios/model-loading",
-              "leap/edge-sdk/ios/conversation-generation",
-              "leap/edge-sdk/ios/messages-content",
-              "leap/edge-sdk/ios/advanced-features",
-              "leap/edge-sdk/ios/utilities",
-              "leap/edge-sdk/ios/cloud-ai-comparison",
-              "leap/edge-sdk/ios/constrained-generation",
-              "leap/edge-sdk/ios/function-calling"
+              {
+                "group": "iOS SDK",
+                "icon": "apple",
+                "pages": [
+                  "deployment/on-device/ios/ios-quick-start-guide",
+                  "deployment/on-device/ios/ai-agent-usage-guide",
+                  "deployment/on-device/ios/model-loading",
+                  "deployment/on-device/ios/conversation-generation",
+                  "deployment/on-device/ios/messages-content",
+                  "deployment/on-device/ios/advanced-features",
+                  "deployment/on-device/ios/utilities",
+                  "deployment/on-device/ios/cloud-ai-comparison",
+                  "deployment/on-device/ios/constrained-generation",
+                  "deployment/on-device/ios/function-calling"
+                ]
+              },
+              {
+                "group": "Android SDK",
+                "icon": "robot",
+                "pages": [
+                  "deployment/on-device/android/android-quick-start-guide",
+                  "deployment/on-device/android/ai-agent-usage-guide",
+                  "deployment/on-device/android/model-loading",
+                  "deployment/on-device/android/conversation-generation",
+                  "deployment/on-device/android/messages-content",
+                  "deployment/on-device/android/advanced-features",
+                  "deployment/on-device/android/utilities",
+                  "deployment/on-device/android/cloud-ai-comparison",
+                  "deployment/on-device/android/constrained-generation",
+                  "deployment/on-device/android/function-calling"
+                ]
+              },
+              "deployment/on-device/llama-cpp",
+              "deployment/on-device/lm-studio",
+              "deployment/on-device/mlx",
+              "deployment/on-device/onnx",
+              "deployment/on-device/ollama"
             ]
           },
           {
-            "group": "Android",
-            "icon": "robot",
+            "group": "GPU Inference",
+            "icon": "microchip",
             "pages": [
-              "leap/edge-sdk/android/android-quick-start-guide",
-              "leap/edge-sdk/android/ai-agent-usage-guide",
-              "leap/edge-sdk/android/model-loading",
-              "leap/edge-sdk/android/conversation-generation",
-              "leap/edge-sdk/android/messages-content",
-              "leap/edge-sdk/android/advanced-features",
-              "leap/edge-sdk/android/utilities",
-              "leap/edge-sdk/android/cloud-ai-comparison",
-              "leap/edge-sdk/android/constrained-generation",
-              "leap/edge-sdk/android/function-calling"
+              "deployment/gpu-inference/transformers",
+              "deployment/gpu-inference/vllm",
+              "deployment/gpu-inference/sglang",
+              "deployment/gpu-inference/modal",
+              "deployment/gpu-inference/baseten",
+              "deployment/gpu-inference/fal"
             ]
           },
           {
-            "group": "Model Bundling Service",
-            "icon": "box",
+            "group": "Tools",
+            "icon": "toolbox",
             "pages": [
-              "leap/leap-bundle/quick-start",
-              "leap/leap-bundle/authentication",
-              "leap/leap-bundle/configuration",
-              "leap/leap-bundle/bundle-creation",
-              "leap/leap-bundle/bundle-management",
-              "leap/leap-bundle/download",
-              "leap/leap-bundle/reference",
-              "leap/leap-bundle/data-privacy",
-              "leap/leap-bundle/changelog"
+              {
+                "group": "Model Bundling Services",
+                "icon": "box",
+                "pages": [
+                  "deployment/tools/model-bundling/quick-start",
+                  "deployment/tools/model-bundling/authentication",
+                  "deployment/tools/model-bundling/configuration",
+                  "deployment/tools/model-bundling/bundle-creation",
+                  "deployment/tools/model-bundling/bundle-management",
+                  "deployment/tools/model-bundling/download",
+                  "deployment/tools/model-bundling/reference",
+                  "deployment/tools/model-bundling/data-privacy",
+                  "deployment/tools/model-bundling/changelog"
+                ]
+              }
             ]
           }
         ]
@@ -192,11 +217,11 @@
         "tab": "Examples",
         "groups": [
           {
-            "group": "Get Started",
+            "group": "Getting Started",
             "icon": "rocket",
             "pages": [
               "examples/index",
-              "docs/getting-started/connect-ai-tools"
+              "examples/connect-ai-tools"
             ]
           },
           {
@@ -244,8 +269,100 @@
   },
   "redirects": [
     {
-      "source": "/lfm/:slug*",
-      "destination": "/docs/:slug*"
+      "source": "/docs/getting-started/welcome",
+      "destination": "/lfm/getting-started/welcome"
+    },
+    {
+      "source": "/docs/getting-started/connect-ai-tools",
+      "destination": "/lfm/getting-started/connect-ai-tools"
+    },
+    {
+      "source": "/docs/models/:slug*",
+      "destination": "/lfm/models/:slug*"
+    },
+    {
+      "source": "/docs/key-concepts/:slug*",
+      "destination": "/lfm/key-concepts/:slug*"
+    },
+    {
+      "source": "/docs/help/:slug*",
+      "destination": "/lfm/help/:slug*"
+    },
+    {
+      "source": "/docs/fine-tuning/workbench",
+      "destination": "/customization/tools/workbench"
+    },
+    {
+      "source": "/docs/fine-tuning/datasets",
+      "destination": "/customization/finetuning-frameworks/datasets"
+    },
+    {
+      "source": "/docs/fine-tuning/trl",
+      "destination": "/customization/finetuning-frameworks/trl"
+    },
+    {
+      "source": "/docs/fine-tuning/unsloth",
+      "destination": "/customization/finetuning-frameworks/unsloth"
+    },
+    {
+      "source": "/docs/inference/llama-cpp",
+      "destination": "/deployment/on-device/llama-cpp"
+    },
+    {
+      "source": "/docs/inference/mlx",
+      "destination": "/deployment/on-device/mlx"
+    },
+    {
+      "source": "/docs/inference/onnx",
+      "destination": "/deployment/on-device/onnx"
+    },
+    {
+      "source": "/docs/inference/ollama",
+      "destination": "/deployment/on-device/ollama"
+    },
+    {
+      "source": "/docs/inference/lm-studio",
+      "destination": "/deployment/on-device/lm-studio"
+    },
+    {
+      "source": "/docs/inference/transformers",
+      "destination": "/deployment/gpu-inference/transformers"
+    },
+    {
+      "source": "/docs/inference/vllm",
+      "destination": "/deployment/gpu-inference/vllm"
+    },
+    {
+      "source": "/docs/inference/sglang",
+      "destination": "/deployment/gpu-inference/sglang"
+    },
+    {
+      "source": "/docs/inference/modal-deployment",
+      "destination": "/deployment/gpu-inference/modal"
+    },
+    {
+      "source": "/docs/inference/baseten-deployment",
+      "destination": "/deployment/gpu-inference/baseten"
+    },
+    {
+      "source": "/docs/inference/fal-deployment",
+      "destination": "/deployment/gpu-inference/fal"
+    },
+    {
+      "source": "/leap/edge-sdk/overview",
+      "destination": "/deployment/on-device/ios/ios-quick-start-guide"
+    },
+    {
+      "source": "/leap/edge-sdk/ios/:slug*",
+      "destination": "/deployment/on-device/ios/:slug*"
+    },
+    {
+      "source": "/leap/edge-sdk/android/:slug*",
+      "destination": "/deployment/on-device/android/:slug*"
+    },
+    {
+      "source": "/leap/leap-bundle/:slug*",
+      "destination": "/deployment/tools/model-bundling/:slug*"
     }
   ],
   "ai": {
diff --git a/docs/models/complete-library.mdx b/docs/models/complete-library.mdx
deleted file mode 100644
index a78bd7e..0000000
--- a/docs/models/complete-library.mdx
+++ /dev/null
@@ -1,94 +0,0 @@
----
-title: "Model Library"
-description: "Liquid Foundation Models (LFMs) are a new class of multimodal architectures built for fast inference and on-device deployment. Browse all available models and formats here."
----
-
-<div className="capabilities">
-
-All of our models share the following capabilities:
-
-- 32k token context length for extended conversations and document processing
-- Designed for fast inference with [Transformers](/docs/inference/transformers), [llama.cpp](/docs/inference/llama-cpp), [vLLM](/docs/inference/vllm), [MLX](/docs/inference/mlx), [Ollama](/docs/inference/ollama), and [LEAP](/docs/frameworks/leap)
-- Trainable via SFT, DPO, and GRPO with [TRL](/docs/fine-tuning/trl) and [Unsloth](/docs/fine-tuning/unsloth)
-
-</div>
-
-## Model Families
-
-<Note>Choose a model based on your desired functionalities. Each individual model card has specific details on deployment and customization.</Note>
-
-<CardGroup cols={2}>
-
-<Card title="Text Models" icon="comment" href="/docs/models/text-models">
-  Chat, tool calling, structured output, and classification.
-</Card>
-
-<Card title="Vision Models" icon="eye" href="/docs/models/vision-models">
-  Image understanding with LFM backbones and custom encoders.
-</Card>
-
-<Card title="Audio Models" icon="headphones" href="/docs/models/audio-models">
-  Interleaved audio/text models for TTS, ASR, and voice chat.
-</Card>
-
-<Card title="Liquid Nanos" icon="sparkles" href="/docs/models/liquid-nanos">
-  Task-specific models for extraction, summarization, RAG, and translation.
-</Card>
-
-</CardGroup>
-
-## Model Formats
-
-All LFM2 models are available in multiple formats for flexible deployment:
-
-- **GGUF** — Best for local CPU/GPU inference on any platform. Use with [llama.cpp](/docs/inference/llama-cpp), [LM Studio](/docs/inference/lm-studio), or [Ollama](/docs/inference/ollama). Append `-GGUF` to any model name.
-- **MLX** — Best for Mac users with Apple Silicon. Leverages unified memory for fast inference via [MLX](/docs/inference/mlx). Browse at [mlx-community](https://huggingface.co/mlx-community/collections?search=LFM).
-- **ONNX** — Best for production deployments and edge devices. Cross-platform with ONNX Runtime across CPUs, GPUs, and accelerators. Append `-ONNX` to any model name.
-
-### Quantization
-
-Quantization reduces model size and speeds up inference with minimal quality loss. Available options by format:
-
-- **GGUF** — Supports `Q4_0`, `Q4_K_M`, `Q5_K_M`, `Q6_K`, `Q8_0`, `BF16`, and `F16`. `Q4_K_M` offers the best balance of size and quality.
-- **MLX** — Available in `3bit`, `4bit`, `5bit`, `6bit`, `8bit`, and `BF16`. `8bit` is recommended.
-- **ONNX** — Supports `FP32`, `FP16`, `Q4`, and `Q8` (MoE models also support `Q4F16`). `Q4` is recommended for most deployments.
-
-## Model Chart
-
-| Model | HF | GGUF | MLX | ONNX | Trainable? |
-| ----- | -- | ---- | --- | ---- | ---------- |
-| **Text-to-text Models** | | | | | |
-| LFM2.5 Models (Latest Release) | | | | | |
-| [LFM2.5-1.2B-Instruct](/docs/models/lfm25-1.2b-instruct) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-MLX-8bit) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-ONNX) | Yes (TRL) |
-| [LFM2.5-1.2B-Thinking](/docs/models/lfm25-1.2b-thinking) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-GGUF) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-MLX-8bit) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-ONNX) | Yes (TRL) |
-| [LFM2.5-1.2B-Base](/docs/models/lfm25-1.2b-base) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base-GGUF) | ✗ | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base-ONNX) | Yes (TRL) |
-| [LFM2.5-1.2B-JP](/docs/models/lfm25-1.2b-jp) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-GGUF) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-MLX-8bit) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-ONNX) | Yes (TRL) |
-| LFM2 Models | | | | | |
-| [LFM2-8B-A1B](/docs/models/lfm2-8b-a1b) | [✓](https://huggingface.co/LiquidAI/LFM2-8B-A1B) | [✓](https://huggingface.co/LiquidAI/LFM2-8B-A1B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-8B-A1B-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-8B-A1B-ONNX) | Yes (TRL) |
-| [LFM2-2.6B](/docs/models/lfm2-2.6b) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-2.6B-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-2.6B-ONNX) | Yes (TRL) |
-| [LFM2-2.6B-Exp](/docs/models/lfm2-2.6b-exp) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B-Exp) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B-Exp-GGUF) | ✗ | ✗ | Yes (TRL) |
-| [LFM2-1.2B](/docs/models/lfm2-1.2b) Deprecated | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-1.2B-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-1.2B-ONNX) | Yes (TRL) |
-| [LFM2-700M](/docs/models/lfm2-700m) | [✓](https://huggingface.co/LiquidAI/LFM2-700M) | [✓](https://huggingface.co/LiquidAI/LFM2-700M-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-700M-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-700M-ONNX) | Yes (TRL) |
-| [LFM2-350M](/docs/models/lfm2-350m) | [✓](https://huggingface.co/LiquidAI/LFM2-350M) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-350M-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-350M-ONNX) | Yes (TRL) |
-| **Vision Language Models** | | | | | |
-| LFM2.5 Models (Latest Release) | | | | | |
-| [LFM2.5-VL-1.6B](/docs/models/lfm25-vl-1.6b) | [✓](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B) | [✓](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2.5-VL-1.6B-8bit) | [✓](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-ONNX) | Yes (TRL) |
-| LFM2 Models | | | | | |
-| [LFM2-VL-3B](/docs/models/lfm2-vl-3b) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-3B) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-3B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-VL-3B-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-VL-3B-ONNX) | Yes (TRL) |
-| [LFM2-VL-1.6B](/docs/models/lfm2-vl-1.6b) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-1.6B) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-1.6B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-VL-1.6B-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-VL-1.6B-ONNX) | Yes (TRL) |
-| [LFM2-VL-450M](/docs/models/lfm2-vl-450m) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-450M) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-450M-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-VL-450M-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-VL-450M-ONNX) | Yes (TRL) |
-| **Audio Models** | | | | | |
-| LFM2.5 Models (Latest Release) | | | | | |
-| [LFM2.5-Audio-1.5B](/docs/models/lfm25-audio-1.5b) | [✓](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) | [✓](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-GGUF) | ✗ | [✓](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-ONNX) | Yes (TRL) |
-| LFM2 Models | | | | | |
-| [LFM2-Audio-1.5B](/docs/models/lfm2-audio-1.5b) | [✓](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B) | [✓](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B-GGUF) | ✗ | ✗ | No |
-| **Liquid Nanos** | | | | | |
-| [LFM2-1.2B-Extract](/docs/models/lfm2-1.2b-extract) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-Extract) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-Extract-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-1.2B-Extract-ONNX) | Yes (TRL) |
-| [LFM2-350M-Extract](/docs/models/lfm2-350m-extract) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-Extract) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-Extract-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-350M-Extract-ONNX) | Yes (TRL) |
-| [LFM2-350M-ENJP-MT](/docs/models/lfm2-350m-enjp-mt) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-350M-ENJP-MT-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-350M-ENJP-MT-ONNX) | Yes (TRL) |
-| [LFM2-1.2B-RAG](/docs/models/lfm2-1.2b-rag) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-RAG) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-RAG-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-1.2B-RAG-ONNX) | Yes (TRL) |
-| [LFM2-1.2B-Tool](/docs/models/lfm2-1.2b-tool) Deprecated | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-Tool) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-Tool-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-1.2B-Tool-ONNX) | Yes (TRL) |
-| [LFM2-350M-Math](/docs/models/lfm2-350m-math) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-Math) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-Math-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-350M-Math-ONNX) | Yes (TRL) |
-| [LFM2-350M-PII-Extract-JP](/docs/models/lfm2-350m-pii-extract-jp) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-PII-Extract-JP) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-PII-Extract-JP-GGUF) | ✗ | ✗ | Yes (TRL) |
-| [LFM2-ColBERT-350M](/docs/models/lfm2-colbert-350m) | [✓](https://huggingface.co/LiquidAI/LFM2-ColBERT-350M) | ✗ | ✗ | ✗ | Yes (PyLate) |
-| [LFM2-2.6B-Transcript](/docs/models/lfm2-2.6b-transcript) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B-Transcript) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B-Transcript-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-2.6B-Transcript-ONNX) | Yes (TRL) |
diff --git a/examples/connect-ai-tools.mdx b/examples/connect-ai-tools.mdx
new file mode 100644
index 0000000..69ee17b
--- /dev/null
+++ b/examples/connect-ai-tools.mdx
@@ -0,0 +1,8 @@
+---
+title: "Connect AI Tools"
+description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation"
+---
+
+import ConnectAiTools from "/snippets/connect-ai-tools.mdx";
+
+<ConnectAiTools></ConnectAiTools>
diff --git a/examples/laptop-examples/audio-to-text-in-real-time.mdx b/examples/laptop-examples/audio-to-text-in-real-time.mdx
index 40d5ece..3fc5000 100644
--- a/examples/laptop-examples/audio-to-text-in-real-time.mdx
+++ b/examples/laptop-examples/audio-to-text-in-real-time.mdx
@@ -6,7 +6,7 @@ title: "Audio transcription in real-time"
   Browse the complete example on GitHub
 </Card>
 
-This example demonstrates how to use the [LFM2-Audio-1.5B](https://docs.liquid.ai/docs/models/lfm2-audio-1.5b) model with llama.cpp to transcribe audio files locally in real-time.
+This example demonstrates how to use the [LFM2-Audio-1.5B](https://docs.liquid.ai/lfm/models/lfm2-audio-1.5b) model with llama.cpp to transcribe audio files locally in real-time.
 
 Intelligent audio assistants on the edge are possible, and this repository is just one step towards that.
 
@@ -120,7 +120,7 @@ For example, we can use
 
 ### What is LFM2-350M?
 
-[LFM2-350M](https://docs.liquid.ai/docs/models/lfm2-350m) is a small text-to-text model that can be used for tasks like text cleaning. To achieve optimal performance for your particular use case, you need to optimize your system and user prompts.
+[LFM2-350M](https://docs.liquid.ai/lfm/models/lfm2-350m) is a small text-to-text model that can be used for tasks like text cleaning. To achieve optimal performance for your particular use case, you need to optimize your system and user prompts.
 
 <Card title="Optimize your prompts with LEAP Workbench" icon="wand-magic-sparkles" href="https://workbench.liquid.ai/">
   Use our no-code tool to optimize your system and user prompts, and get your model ready for deployment.
diff --git a/examples/laptop-examples/flight-search-assistant.mdx b/examples/laptop-examples/flight-search-assistant.mdx
index a7d6ba4..bef9128 100644
--- a/examples/laptop-examples/flight-search-assistant.mdx
+++ b/examples/laptop-examples/flight-search-assistant.mdx
@@ -6,7 +6,7 @@ title: "Flight search assistant with tool calling"
   Browse the complete example on GitHub
 </Card>
 
-This project demonstrates a Python CLI that leverages the [LFM2.5-1.2B-Thinking](/docs/models/lfm25-1.2b-thinking) model to help users find and book plane tickets through multi-step reasoning and tool calling.
+This project demonstrates a Python CLI that leverages the [LFM2.5-1.2B-Thinking](/lfm/models/lfm25-1.2b-thinking) model to help users find and book plane tickets through multi-step reasoning and tool calling.
 
 ![Flight Search Assistant Demo](https://raw.githubusercontent.com/Liquid4All/cookbook/main/examples/flight-search-assistant/media/demo.gif)
 
diff --git a/examples/laptop-examples/invoice-extractor-tool-with-liquid-nanos.mdx b/examples/laptop-examples/invoice-extractor-tool-with-liquid-nanos.mdx
index 1a3efc1..b8c345c 100644
--- a/examples/laptop-examples/invoice-extractor-tool-with-liquid-nanos.mdx
+++ b/examples/laptop-examples/invoice-extractor-tool-with-liquid-nanos.mdx
@@ -22,7 +22,7 @@ In this example, you will learn how to:
 
 * **Set up local AI inference** using llama.cpp to run Liquid models entirely on your machine without requiring cloud services or API keys
 * **Build a file monitoring system** that automatically processes new files dropped into a directory
-* **Extract structured output from images** using [LFM2.5-VL-1.6B](https://docs.liquid.ai/docs/models/lfm25-vl-1.6b), a small vision-language model.
+* **Extract structured output from images** using [LFM2.5-VL-1.6B](https://docs.liquid.ai/lfm/models/lfm25-vl-1.6b), a small vision-language model.
 
 ## Environment setup
 
diff --git a/examples/web/vl-webgpu-demo.mdx b/examples/web/vl-webgpu-demo.mdx
index a80dfed..a9c2671 100644
--- a/examples/web/vl-webgpu-demo.mdx
+++ b/examples/web/vl-webgpu-demo.mdx
@@ -6,7 +6,7 @@ title: "Real-time video captioning with LFM2.5-VL-1.6B and WebGPU"
   Browse the complete example on GitHub
 </Card>
 
-This example demonstrates how to run a vision-language model directly in your web browser using WebGPU acceleration. The demo showcases real-time video captioning with the [LFM2.5-VL-1.6B](/docs/models/lfm25-vl-1.6b) model, eliminating the need for cloud-based inference services.
+This example demonstrates how to run a vision-language model directly in your web browser using WebGPU acceleration. The demo showcases real-time video captioning with the [LFM2.5-VL-1.6B](/lfm/models/lfm25-vl-1.6b) model, eliminating the need for cloud-based inference services.
 
 ## Key Features
 
@@ -43,7 +43,7 @@ This example demonstrates how to run a vision-language model directly in your we
 
 ## Understanding the Architecture
 
-This demo uses the **[LFM2.5-VL-1.6B](/docs/models/lfm25-vl-1.6b)** model, a 1.6 billion parameter vision-language model that has been quantized for efficient browser-based inference. The model runs entirely client-side using ONNX Runtime Web with WebGPU acceleration.
+This demo uses the **[LFM2.5-VL-1.6B](/lfm/models/lfm25-vl-1.6b)** model, a 1.6 billion parameter vision-language model that has been quantized for efficient browser-based inference. The model runs entirely client-side using ONNX Runtime Web with WebGPU acceleration.
 
 ### Remote vs. Local Inference
 
@@ -57,7 +57,7 @@ With WebGPU and local inference, everything runs directly in your browser:
 
 ### Technical Stack
 
-- **Model**: [LFM2.5-VL-1.6B](/docs/models/lfm25-vl-1.6b) (quantized ONNX format)
+- **Model**: [LFM2.5-VL-1.6B](/lfm/models/lfm25-vl-1.6b) (quantized ONNX format)
 - **Inference Engine**: ONNX Runtime Web with WebGPU backend
 - **Build Tool**: Vite for fast development and optimized production builds
 - **Browser Requirements**: WebGPU-compatible browser (Chrome, Edge)
diff --git a/leap/edge-sdk/overview.mdx b/leap/edge-sdk/overview.mdx
index 9e4d1ca..fa91a45 100644
--- a/leap/edge-sdk/overview.mdx
+++ b/leap/edge-sdk/overview.mdx
@@ -36,4 +36,4 @@ The current list of main features includes:
 
 We are consistently adding to this list - see our [changelog](/leap/changelog) for detailed updates.
 
-[Edit this page](https://github.com/Liquid4All/docs/tree/main/leap/edge-sdk/overview.mdx)
+[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/ios/ios-quick-start-guide.mdx)
diff --git a/lfm/getting-started/connect-ai-tools.mdx b/lfm/getting-started/connect-ai-tools.mdx
new file mode 100644
index 0000000..69ee17b
--- /dev/null
+++ b/lfm/getting-started/connect-ai-tools.mdx
@@ -0,0 +1,8 @@
+---
+title: "Connect AI Tools"
+description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation"
+---
+
+import ConnectAiTools from "/snippets/connect-ai-tools.mdx";
+
+<ConnectAiTools></ConnectAiTools>
diff --git a/docs/getting-started/welcome.mdx b/lfm/getting-started/welcome.mdx
similarity index 84%
rename from docs/getting-started/welcome.mdx
rename to lfm/getting-started/welcome.mdx
index 09d08ac..2e40ad5 100644
--- a/docs/getting-started/welcome.mdx
+++ b/lfm/getting-started/welcome.mdx
@@ -42,16 +42,19 @@ Built on a new hybrid architecture, LFM2 sets a new standard in quality, speed,
 
 ## Get Started
 
-<CardGroup cols={3}>
-  <Card title="Explore Models" icon="brain" href="/docs/models/complete-library">
+<CardGroup cols={2}>
+  <Card title="Explore Models" icon="brain" href="/lfm/models/complete-library">
     Browse our collection of language models and their capabilities
   </Card>
-  <Card title="Inference Guides" icon="play" href="/docs/inference/transformers">
+  <Card title="Inference Guides" icon="play" href="/deployment/gpu-inference/transformers">
     Learn how to run models for different use cases and platforms
   </Card>
-  <Card title="Fine-tuning" icon="sliders" href="/docs/fine-tuning/trl">
+  <Card title="Fine-tuning" icon="sliders" href="/customization/finetuning-frameworks/trl">
     Customize models for your specific requirements
   </Card>
+  <Card title="Examples" icon="code" href="/examples/index">
+    End-to-end examples for mobile, laptop, and web
+  </Card>
 </CardGroup>
 
 <Panel>
diff --git a/docs/help/contributing.mdx b/lfm/help/contributing.mdx
similarity index 92%
rename from docs/help/contributing.mdx
rename to lfm/help/contributing.mdx
index 0f84180..d545dc5 100644
--- a/docs/help/contributing.mdx
+++ b/lfm/help/contributing.mdx
@@ -102,8 +102,8 @@ Use Mintlify components appropriately:
 
 ### Links
 
-- Use relative links for internal pages: `/docs/inference/transformers`
-- Use descriptive link text: "See the [inference guide](/docs/inference/transformers)" not "Click [here](/docs/inference/transformers)"
+- Use relative links for internal pages: `/deployment/gpu-inference/transformers`
+- Use descriptive link text: "See the [inference guide](/deployment/gpu-inference/transformers)" not "Click [here](/deployment/gpu-inference/transformers)"
 
 ## What to Contribute
 
diff --git a/docs/help/faqs.mdx b/lfm/help/faqs.mdx
similarity index 72%
rename from docs/help/faqs.mdx
rename to lfm/help/faqs.mdx
index fa81927..d455f22 100644
--- a/docs/help/faqs.mdx
+++ b/lfm/help/faqs.mdx
@@ -15,12 +15,12 @@ All LFM models support a 32k token text context length for extended conversation
 
 <Accordion title="Which inference frameworks are supported?">
 LFM models are compatible with:
-- [Transformers](/docs/inference/transformers) - For research and development
-- [llama.cpp](/docs/inference/llama-cpp) - For efficient CPU inference
-- [vLLM](/docs/inference/vllm) - For high-throughput production serving
-- [MLX](/docs/inference/mlx) - For Apple Silicon optimization
-- [Ollama](/docs/inference/ollama) - For easy local deployment
-- [LEAP](/leap/edge-sdk/overview) - For edge and mobile deployment
+- [Transformers](/deployment/gpu-inference/transformers) - For research and development
+- [llama.cpp](/deployment/on-device/llama-cpp) - For efficient CPU inference
+- [vLLM](/deployment/gpu-inference/vllm) - For high-throughput production serving
+- [MLX](/deployment/on-device/mlx) - For Apple Silicon optimization
+- [Ollama](/deployment/on-device/ollama) - For easy local deployment
+- [LEAP](/deployment/on-device/ios/ios-quick-start-guide) - For edge and mobile deployment
 </Accordion>
 
 ## Model Selection
@@ -39,7 +39,7 @@ LFM2.5 models are updated versions with improved training that deliver higher pe
 </Accordion>
 
 <Accordion title="What are Liquid Nanos?">
-[Liquid Nanos](/docs/models/liquid-nanos) are task-specific models fine-tuned for specialized use cases like:
+[Liquid Nanos](/lfm/models/liquid-nanos) are task-specific models fine-tuned for specialized use cases like:
 - Information extraction (LFM2-Extract)
 - Translation (LFM2-350M-ENJP-MT)
 - RAG question answering (LFM2-1.2B-RAG)
@@ -49,7 +49,7 @@ LFM2.5 models are updated versions with improved training that deliver higher pe
 ## Deployment
 
 <Accordion title="Can I run LFM models on mobile devices?">
-Yes! Use the [LEAP SDK](/leap/edge-sdk/overview) to deploy models on iOS and Android devices. LEAP provides optimized inference for edge deployment with support for quantized models.
+Yes! Use the [LEAP SDK](/deployment/on-device/ios/ios-quick-start-guide) to deploy models on iOS and Android devices. LEAP provides optimized inference for edge deployment with support for quantized models.
 </Accordion>
 
 <Accordion title="What quantization formats are available?">
@@ -69,7 +69,7 @@ For most use cases, Q4_K_M or Q5_K_M provide good quality with significant size
 ## Fine-tuning
 
 <Accordion title="Can I fine-tune LFM models?">
-Yes! Most LFM models support fine-tuning with [TRL](/docs/fine-tuning/trl) and [Unsloth](/docs/fine-tuning/unsloth). Check the [Model Library](/docs/models/complete-library) for trainability information.
+Yes! Most LFM models support fine-tuning with [TRL](/customization/finetuning-frameworks/trl) and [Unsloth](/customization/finetuning-frameworks/unsloth). Check the [Model Library](/lfm/models/complete-library) for trainability information.
 </Accordion>
 
 <Accordion title="What fine-tuning methods are supported?">
@@ -82,4 +82,4 @@ Yes! Most LFM models support fine-tuning with [TRL](/docs/fine-tuning/trl) and [
 
 - Join our [Discord community](https://discord.gg/DFU3WQeaYD) for real-time help
 - Check the [Cookbook](https://github.com/Liquid4All/cookbook) for examples
-- See [Troubleshooting](/docs/help/troubleshooting) for common issues
+- See [Troubleshooting](/lfm/help/troubleshooting) for common issues
diff --git a/docs/help/troubleshooting.mdx b/lfm/help/troubleshooting.mdx
similarity index 100%
rename from docs/help/troubleshooting.mdx
rename to lfm/help/troubleshooting.mdx
diff --git a/docs/key-concepts/chat-template.mdx b/lfm/key-concepts/chat-template.mdx
similarity index 98%
rename from docs/key-concepts/chat-template.mdx
rename to lfm/key-concepts/chat-template.mdx
index db15e2f..2f8ea9c 100644
--- a/docs/key-concepts/chat-template.mdx
+++ b/lfm/key-concepts/chat-template.mdx
@@ -23,7 +23,7 @@ LFM2 supports four conversation roles:
 * **`system`** — (Optional) Defines who the assistant is and how it should respond.
 * **`user`** — Messages from the user containing questions and instructions.
 * **`assistant`** — Responses from the model.
-* **`tool`** — Results from tool/function execution. Used for [tool use](/docs/key-concepts/tool-use) workflows.
+* **`tool`** — Results from tool/function execution. Used for [tool use](/lfm/key-concepts/tool-use) workflows.
 
 The complete chat template definition can be found in the `chat_template.jinja` file in each model's Hugging Face repository.
 
diff --git a/docs/key-concepts/text-generation-and-prompting.mdx b/lfm/key-concepts/text-generation-and-prompting.mdx
similarity index 95%
rename from docs/key-concepts/text-generation-and-prompting.mdx
rename to lfm/key-concepts/text-generation-and-prompting.mdx
index 35b5852..7fc71f8 100644
--- a/docs/key-concepts/text-generation-and-prompting.mdx
+++ b/lfm/key-concepts/text-generation-and-prompting.mdx
@@ -73,7 +73,7 @@ Control text generation behavior, balancing creativity, determinism, and quality
 * **`repetition_penalty`** (1.0+) - Reduces repetition. 1.0 = no penalty; >1.0 = prevents repetition.
 * **`max_tokens`** / **`max_new_tokens`** - Maximum tokens to generate.
 
-Parameter names and syntax vary by platform. See [Transformers](/docs/inference/transformers), [vLLM](/docs/inference/vllm), or [llama.cpp](/docs/inference/llama-cpp) for details.
+Parameter names and syntax vary by platform. See [Transformers](/deployment/gpu-inference/transformers), [vLLM](/deployment/gpu-inference/vllm), or [llama.cpp](/deployment/on-device/llama-cpp) for details.
 
 ### Recommended Settings Text
 
@@ -132,5 +132,5 @@ min_image_tokens = 32
 * `do_image_splitting=True`
 
 <Note>
-**Liquid Nanos** (task-specific models like LFM2-Extract, LFM2-RAG, LFM2-Tool, etc.) may have special prompting requirements and different generation parameters. For the best usage guidelines, refer to the individual model cards on the [Liquid Nanos](/docs/models/liquid-nanos) page.
+**Liquid Nanos** (task-specific models like LFM2-Extract, LFM2-RAG, LFM2-Tool, etc.) may have special prompting requirements and different generation parameters. For the best usage guidelines, refer to the individual model cards on the [Liquid Nanos](/lfm/models/liquid-nanos) page.
 </Note>
diff --git a/docs/key-concepts/tool-use.mdx b/lfm/key-concepts/tool-use.mdx
similarity index 100%
rename from docs/key-concepts/tool-use.mdx
rename to lfm/key-concepts/tool-use.mdx
diff --git a/docs/models/audio-models.mdx b/lfm/models/audio-models.mdx
similarity index 94%
rename from docs/models/audio-models.mdx
rename to lfm/models/audio-models.mdx
index a81c50e..1434969 100644
--- a/docs/models/audio-models.mdx
+++ b/lfm/models/audio-models.mdx
@@ -32,7 +32,7 @@ icon: "headphones"
 
 <CardGroup cols={2}>
 
-<Card title="LFM2.5-Audio-1.5B" href="/docs/models/lfm25-audio-1.5b">
+<Card title="LFM2.5-Audio-1.5B" href="/lfm/models/lfm25-audio-1.5b">
   1.5B · <Badge shape="pill" color="green">Recommended</Badge>
 
   Best audio model for most use cases. Fast, accurate, and CPU-friendly.
@@ -44,7 +44,7 @@ icon: "headphones"
 
 <CardGroup cols={2}>
 
-<Card title="LFM2-Audio-1.5B" href="/docs/models/lfm2-audio-1.5b">
+<Card title="LFM2-Audio-1.5B" href="/lfm/models/lfm2-audio-1.5b">
   1.5B · <Badge shape="pill" color="gray">Deprecated</Badge>
 
   Use the new LFM2.5-Audio-1.5B checkpoint instead.
diff --git a/lfm/models/complete-library.mdx b/lfm/models/complete-library.mdx
new file mode 100644
index 0000000..1d7a6f8
--- /dev/null
+++ b/lfm/models/complete-library.mdx
@@ -0,0 +1,94 @@
+---
+title: "Model Library"
+description: "Liquid Foundation Models (LFMs) are a new class of multimodal architectures built for fast inference and on-device deployment. Browse all available models and formats here."
+---
+
+<div className="capabilities">
+
+All of our models share the following capabilities:
+
+- 32k token context length for extended conversations and document processing
+- Designed for fast inference with [Transformers](/deployment/gpu-inference/transformers), [llama.cpp](/deployment/on-device/llama-cpp), [vLLM](/deployment/gpu-inference/vllm), [MLX](/deployment/on-device/mlx), [Ollama](/deployment/on-device/ollama), and [LEAP](/deployment/on-device/ios/ios-quick-start-guide)
+- Trainable via SFT, DPO, and GRPO with [TRL](/customization/finetuning-frameworks/trl) and [Unsloth](/customization/finetuning-frameworks/unsloth)
+
+</div>
+
+## Model Families
+
+<Note>Choose a model based on your desired functionalities. Each individual model card has specific details on deployment and customization.</Note>
+
+<CardGroup cols={2}>
+
+<Card title="Text Models" icon="comment" href="/lfm/models/text-models">
+  Chat, tool calling, structured output, and classification.
+</Card>
+
+<Card title="Vision Models" icon="eye" href="/lfm/models/vision-models">
+  Image understanding with LFM backbones and custom encoders.
+</Card>
+
+<Card title="Audio Models" icon="headphones" href="/lfm/models/audio-models">
+  Interleaved audio/text models for TTS, ASR, and voice chat.
+</Card>
+
+<Card title="Liquid Nanos" icon="sparkles" href="/lfm/models/liquid-nanos">
+  Task-specific models for extraction, summarization, RAG, and translation.
+</Card>
+
+</CardGroup>
+
+## Model Formats
+
+All LFM2 models are available in multiple formats for flexible deployment:
+
+- **GGUF** — Best for local CPU/GPU inference on any platform. Use with [llama.cpp](/deployment/on-device/llama-cpp), [LM Studio](/deployment/on-device/lm-studio), or [Ollama](/deployment/on-device/ollama). Append `-GGUF` to any model name.
+- **MLX** — Best for Mac users with Apple Silicon. Leverages unified memory for fast inference via [MLX](/deployment/on-device/mlx). Browse at [mlx-community](https://huggingface.co/mlx-community/collections?search=LFM).
+- **ONNX** — Best for production deployments and edge devices. Cross-platform with ONNX Runtime across CPUs, GPUs, and accelerators. Append `-ONNX` to any model name.
+
+### Quantization
+
+Quantization reduces model size and speeds up inference with minimal quality loss. Available options by format:
+
+- **GGUF** — Supports `Q4_0`, `Q4_K_M`, `Q5_K_M`, `Q6_K`, `Q8_0`, `BF16`, and `F16`. `Q4_K_M` offers the best balance of size and quality.
+- **MLX** — Available in `3bit`, `4bit`, `5bit`, `6bit`, `8bit`, and `BF16`. `8bit` is recommended.
+- **ONNX** — Supports `FP32`, `FP16`, `Q4`, and `Q8` (MoE models also support `Q4F16`). `Q4` is recommended for most deployments.
+
+## Model Chart
+
+| Model | HF | GGUF | MLX | ONNX | Trainable? |
+| ----- | -- | ---- | --- | ---- | ---------- |
+| **Text-to-text Models** | | | | | |
+| LFM2.5 Models (Latest Release) | | | | | |
+| [LFM2.5-1.2B-Instruct](/lfm/models/lfm25-1.2b-instruct) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-MLX-8bit) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-ONNX) | Yes (TRL) |
+| [LFM2.5-1.2B-Thinking](/lfm/models/lfm25-1.2b-thinking) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-GGUF) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-MLX-8bit) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-ONNX) | Yes (TRL) |
+| [LFM2.5-1.2B-Base](/lfm/models/lfm25-1.2b-base) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base-GGUF) | ✗ | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base-ONNX) | Yes (TRL) |
+| [LFM2.5-1.2B-JP](/lfm/models/lfm25-1.2b-jp) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-GGUF) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-MLX-8bit) | [✓](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-ONNX) | Yes (TRL) |
+| LFM2 Models | | | | | |
+| [LFM2-8B-A1B](/lfm/models/lfm2-8b-a1b) | [✓](https://huggingface.co/LiquidAI/LFM2-8B-A1B) | [✓](https://huggingface.co/LiquidAI/LFM2-8B-A1B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-8B-A1B-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-8B-A1B-ONNX) | Yes (TRL) |
+| [LFM2-2.6B](/lfm/models/lfm2-2.6b) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-2.6B-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-2.6B-ONNX) | Yes (TRL) |
+| [LFM2-2.6B-Exp](/lfm/models/lfm2-2.6b-exp) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B-Exp) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B-Exp-GGUF) | ✗ | ✗ | Yes (TRL) |
+| [LFM2-1.2B](/lfm/models/lfm2-1.2b) Deprecated | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-1.2B-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-1.2B-ONNX) | Yes (TRL) |
+| [LFM2-700M](/lfm/models/lfm2-700m) | [✓](https://huggingface.co/LiquidAI/LFM2-700M) | [✓](https://huggingface.co/LiquidAI/LFM2-700M-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-700M-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-700M-ONNX) | Yes (TRL) |
+| [LFM2-350M](/lfm/models/lfm2-350m) | [✓](https://huggingface.co/LiquidAI/LFM2-350M) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-350M-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-350M-ONNX) | Yes (TRL) |
+| **Vision Language Models** | | | | | |
+| LFM2.5 Models (Latest Release) | | | | | |
+| [LFM2.5-VL-1.6B](/lfm/models/lfm25-vl-1.6b) | [✓](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B) | [✓](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2.5-VL-1.6B-8bit) | [✓](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-ONNX) | Yes (TRL) |
+| LFM2 Models | | | | | |
+| [LFM2-VL-3B](/lfm/models/lfm2-vl-3b) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-3B) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-3B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-VL-3B-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-VL-3B-ONNX) | Yes (TRL) |
+| [LFM2-VL-1.6B](/lfm/models/lfm2-vl-1.6b) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-1.6B) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-1.6B-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-VL-1.6B-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-VL-1.6B-ONNX) | Yes (TRL) |
+| [LFM2-VL-450M](/lfm/models/lfm2-vl-450m) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-450M) | [✓](https://huggingface.co/LiquidAI/LFM2-VL-450M-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-VL-450M-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-VL-450M-ONNX) | Yes (TRL) |
+| **Audio Models** | | | | | |
+| LFM2.5 Models (Latest Release) | | | | | |
+| [LFM2.5-Audio-1.5B](/lfm/models/lfm25-audio-1.5b) | [✓](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) | [✓](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-GGUF) | ✗ | [✓](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-ONNX) | Yes (TRL) |
+| LFM2 Models | | | | | |
+| [LFM2-Audio-1.5B](/lfm/models/lfm2-audio-1.5b) | [✓](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B) | [✓](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B-GGUF) | ✗ | ✗ | No |
+| **Liquid Nanos** | | | | | |
+| [LFM2-1.2B-Extract](/lfm/models/lfm2-1.2b-extract) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-Extract) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-Extract-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-1.2B-Extract-ONNX) | Yes (TRL) |
+| [LFM2-350M-Extract](/lfm/models/lfm2-350m-extract) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-Extract) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-Extract-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-350M-Extract-ONNX) | Yes (TRL) |
+| [LFM2-350M-ENJP-MT](/lfm/models/lfm2-350m-enjp-mt) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT-GGUF) | [✓](https://huggingface.co/mlx-community/LFM2-350M-ENJP-MT-8bit) | [✓](https://huggingface.co/onnx-community/LFM2-350M-ENJP-MT-ONNX) | Yes (TRL) |
+| [LFM2-1.2B-RAG](/lfm/models/lfm2-1.2b-rag) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-RAG) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-RAG-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-1.2B-RAG-ONNX) | Yes (TRL) |
+| [LFM2-1.2B-Tool](/lfm/models/lfm2-1.2b-tool) Deprecated | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-Tool) | [✓](https://huggingface.co/LiquidAI/LFM2-1.2B-Tool-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-1.2B-Tool-ONNX) | Yes (TRL) |
+| [LFM2-350M-Math](/lfm/models/lfm2-350m-math) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-Math) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-Math-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-350M-Math-ONNX) | Yes (TRL) |
+| [LFM2-350M-PII-Extract-JP](/lfm/models/lfm2-350m-pii-extract-jp) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-PII-Extract-JP) | [✓](https://huggingface.co/LiquidAI/LFM2-350M-PII-Extract-JP-GGUF) | ✗ | ✗ | Yes (TRL) |
+| [LFM2-ColBERT-350M](/lfm/models/lfm2-colbert-350m) | [✓](https://huggingface.co/LiquidAI/LFM2-ColBERT-350M) | ✗ | ✗ | ✗ | Yes (PyLate) |
+| [LFM2-2.6B-Transcript](/lfm/models/lfm2-2.6b-transcript) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B-Transcript) | [✓](https://huggingface.co/LiquidAI/LFM2-2.6B-Transcript-GGUF) | ✗ | [✓](https://huggingface.co/onnx-community/LFM2-2.6B-Transcript-ONNX) | Yes (TRL) |
diff --git a/docs/models/lfm2-1.2b-extract.mdx b/lfm/models/lfm2-1.2b-extract.mdx
similarity index 97%
rename from docs/models/lfm2-1.2b-extract.mdx
rename to lfm/models/lfm2-1.2b-extract.mdx
index 2995713..eccaa68 100644
--- a/docs/models/lfm2-1.2b-extract.mdx
+++ b/lfm/models/lfm2-1.2b-extract.mdx
@@ -3,7 +3,7 @@ title: "LFM2-1.2B-Extract"
 description: "1.2B parameter model for structured information extraction from documents"
 ---
 
-<a href="/docs/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
+<a href="/lfm/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
 
 LFM2-1.2B-Extract is optimized for extracting structured data (JSON, XML, YAML) from unstructured documents. It handles complex nested schemas and multi-field extraction with high accuracy.
 
diff --git a/docs/models/lfm2-1.2b-rag.mdx b/lfm/models/lfm2-1.2b-rag.mdx
similarity index 97%
rename from docs/models/lfm2-1.2b-rag.mdx
rename to lfm/models/lfm2-1.2b-rag.mdx
index a43d963..bf219d5 100644
--- a/docs/models/lfm2-1.2b-rag.mdx
+++ b/lfm/models/lfm2-1.2b-rag.mdx
@@ -3,7 +3,7 @@ title: "LFM2-1.2B-RAG"
 description: "1.2B parameter model optimized for Retrieval-Augmented Generation"
 ---
 
-<a href="/docs/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
+<a href="/lfm/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
 
 LFM2-1.2B-RAG is optimized for answering questions grounded in provided context documents. It excels at extracting relevant information from retrieved documents while avoiding hallucination.
 
diff --git a/docs/models/lfm2-1.2b-tool.mdx b/lfm/models/lfm2-1.2b-tool.mdx
similarity index 82%
rename from docs/models/lfm2-1.2b-tool.mdx
rename to lfm/models/lfm2-1.2b-tool.mdx
index 91e7a63..a62e330 100644
--- a/docs/models/lfm2-1.2b-tool.mdx
+++ b/lfm/models/lfm2-1.2b-tool.mdx
@@ -3,10 +3,10 @@ title: "LFM2-1.2B-Tool"
 description: "1.2B parameter model for tool calling (deprecated)"
 ---
 
-<a href="/docs/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
+<a href="/lfm/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
 
 <Warning>
-This model is deprecated. Use [LFM2.5-1.2B-Instruct](/docs/models/lfm25-1.2b-instruct) for tool calling instead, which offers improved accuracy and follows the standard tool use format.
+This model is deprecated. Use [LFM2.5-1.2B-Instruct](/lfm/models/lfm25-1.2b-instruct) for tool calling instead, which offers improved accuracy and follows the standard tool use format.
 </Warning>
 
 LFM2-1.2B-Tool was optimized for efficient and precise tool calling. It has been superseded by LFM2.5-1.2B-Instruct which provides better tool calling performance alongside general chat capabilities.
@@ -40,4 +40,4 @@ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
 # See the Tool Use guide for complete examples
 ```
 
-See the [Tool Use](/docs/key-concepts/tool-use) guide for detailed tool calling documentation.
+See the [Tool Use](/lfm/key-concepts/tool-use) guide for detailed tool calling documentation.
diff --git a/docs/models/lfm2-1.2b.mdx b/lfm/models/lfm2-1.2b.mdx
similarity index 91%
rename from docs/models/lfm2-1.2b.mdx
rename to lfm/models/lfm2-1.2b.mdx
index 72a379d..7af719b 100644
--- a/docs/models/lfm2-1.2b.mdx
+++ b/lfm/models/lfm2-1.2b.mdx
@@ -7,10 +7,10 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx";
 import { TextVllm } from "/snippets/quickstart/text-vllm.mdx";
 import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx";
 
-<a href="/docs/models/text-models" className="back-button">← Back to Text Models</a>
+<a href="/lfm/models/text-models" className="back-button">← Back to Text Models</a>
 
 <Warning>
-This model is deprecated. Use [LFM2.5-1.2B-Instruct](/docs/models/lfm25-1.2b-instruct) for improved performance.
+This model is deprecated. Use [LFM2.5-1.2B-Instruct](/lfm/models/lfm25-1.2b-instruct) for improved performance.
 </Warning>
 
 LFM2-1.2B was the original 1.2B parameter model in the LFM2 series. It has been superseded by LFM2.5-1.2B-Instruct, which offers better chat, instruction-following, and tool-calling performance.
diff --git a/docs/models/lfm2-2.6b-exp.mdx b/lfm/models/lfm2-2.6b-exp.mdx
similarity index 95%
rename from docs/models/lfm2-2.6b-exp.mdx
rename to lfm/models/lfm2-2.6b-exp.mdx
index 967c712..552a05d 100644
--- a/docs/models/lfm2-2.6b-exp.mdx
+++ b/lfm/models/lfm2-2.6b-exp.mdx
@@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx";
 import { TextVllm } from "/snippets/quickstart/text-vllm.mdx";
 import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx";
 
-<a href="/docs/models/text-models" className="back-button">← Back to Text Models</a>
+<a href="/lfm/models/text-models" className="back-button">← Back to Text Models</a>
 
 LFM2-2.6B-Exp is an experimental checkpoint of LFM2-2.6B with RL-only post-training, delivering improved performance on math and reasoning benchmarks. Use this model when you need stronger analytical capabilities.
 
diff --git a/docs/models/lfm2-2.6b-transcript.mdx b/lfm/models/lfm2-2.6b-transcript.mdx
similarity index 98%
rename from docs/models/lfm2-2.6b-transcript.mdx
rename to lfm/models/lfm2-2.6b-transcript.mdx
index 6a71d13..c1afe91 100644
--- a/docs/models/lfm2-2.6b-transcript.mdx
+++ b/lfm/models/lfm2-2.6b-transcript.mdx
@@ -3,7 +3,7 @@ title: "LFM2-2.6B-Transcript"
 description: "2.6B parameter model for private, on-device meeting summarization"
 ---
 
-<a href="/docs/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
+<a href="/lfm/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
 
 LFM2-2.6B-Transcript is designed for private, on-device meeting summarization from transcripts. It generates executive summaries, detailed summaries, action items, key decisions, and participant lists.
 
diff --git a/docs/models/lfm2-2.6b.mdx b/lfm/models/lfm2-2.6b.mdx
similarity index 96%
rename from docs/models/lfm2-2.6b.mdx
rename to lfm/models/lfm2-2.6b.mdx
index 4c3f768..4dd6635 100644
--- a/docs/models/lfm2-2.6b.mdx
+++ b/lfm/models/lfm2-2.6b.mdx
@@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx";
 import { TextVllm } from "/snippets/quickstart/text-vllm.mdx";
 import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx";
 
-<a href="/docs/models/text-models" className="back-button">← Back to Text Models</a>
+<a href="/lfm/models/text-models" className="back-button">← Back to Text Models</a>
 
 LFM2-2.6B is a versatile mid-sized model delivering strong performance across chat, reasoning, and tool-calling tasks. Optimized for deployment on consumer devices including phones and laptops.
 
diff --git a/docs/models/lfm2-350m-enjp-mt.mdx b/lfm/models/lfm2-350m-enjp-mt.mdx
similarity index 97%
rename from docs/models/lfm2-350m-enjp-mt.mdx
rename to lfm/models/lfm2-350m-enjp-mt.mdx
index 1b153fb..3d5efd6 100644
--- a/docs/models/lfm2-350m-enjp-mt.mdx
+++ b/lfm/models/lfm2-350m-enjp-mt.mdx
@@ -3,7 +3,7 @@ title: "LFM2-350M-ENJP-MT"
 description: "350M parameter model for bidirectional English-Japanese translation"
 ---
 
-<a href="/docs/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
+<a href="/lfm/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
 
 LFM2-350M-ENJP-MT is a specialized translation model for near real-time bidirectional Japanese/English translation. Optimized for short-to-medium text with low latency.
 
diff --git a/docs/models/lfm2-350m-extract.mdx b/lfm/models/lfm2-350m-extract.mdx
similarity index 97%
rename from docs/models/lfm2-350m-extract.mdx
rename to lfm/models/lfm2-350m-extract.mdx
index fff4455..9162c46 100644
--- a/docs/models/lfm2-350m-extract.mdx
+++ b/lfm/models/lfm2-350m-extract.mdx
@@ -3,7 +3,7 @@ title: "LFM2-350M-Extract"
 description: "350M parameter extraction model for edge deployment"
 ---
 
-<a href="/docs/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
+<a href="/lfm/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
 
 LFM2-350M-Extract is the fastest extraction model, optimized for edge deployment with strict memory and compute constraints. It delivers structured data extraction with minimal latency.
 
diff --git a/docs/models/lfm2-350m-math.mdx b/lfm/models/lfm2-350m-math.mdx
similarity index 96%
rename from docs/models/lfm2-350m-math.mdx
rename to lfm/models/lfm2-350m-math.mdx
index a053910..62db0e1 100644
--- a/docs/models/lfm2-350m-math.mdx
+++ b/lfm/models/lfm2-350m-math.mdx
@@ -3,7 +3,7 @@ title: "LFM2-350M-Math"
 description: "350M parameter model for math problem solving"
 ---
 
-<a href="/docs/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
+<a href="/lfm/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
 
 LFM2-350M-Math is a tiny reasoning model optimized for mathematical problem solving. It provides step-by-step solutions while maintaining a small footprint for edge deployment.
 
diff --git a/docs/models/lfm2-350m-pii-extract-jp.mdx b/lfm/models/lfm2-350m-pii-extract-jp.mdx
similarity index 97%
rename from docs/models/lfm2-350m-pii-extract-jp.mdx
rename to lfm/models/lfm2-350m-pii-extract-jp.mdx
index cf70d24..e24748c 100644
--- a/docs/models/lfm2-350m-pii-extract-jp.mdx
+++ b/lfm/models/lfm2-350m-pii-extract-jp.mdx
@@ -3,7 +3,7 @@ title: "LFM2-350M-PII-Extract-JP"
 description: "350M parameter model for Japanese PII detection and extraction"
 ---
 
-<a href="/docs/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
+<a href="/lfm/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
 
 LFM2-350M-PII-Extract-JP extracts personally identifiable information (PII) from Japanese text as structured JSON. Output can be used to mask sensitive information on-device for privacy-preserving applications.
 
diff --git a/docs/models/lfm2-350m.mdx b/lfm/models/lfm2-350m.mdx
similarity index 96%
rename from docs/models/lfm2-350m.mdx
rename to lfm/models/lfm2-350m.mdx
index 5176bd2..00dae65 100644
--- a/docs/models/lfm2-350m.mdx
+++ b/lfm/models/lfm2-350m.mdx
@@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx";
 import { TextVllm } from "/snippets/quickstart/text-vllm.mdx";
 import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx";
 
-<a href="/docs/models/text-models" className="back-button">← Back to Text Models</a>
+<a href="/lfm/models/text-models" className="back-button">← Back to Text Models</a>
 
 LFM2-350M is Liquid AI's smallest text model, designed for edge devices with strict memory and compute constraints. Delivers surprisingly strong performance for its size, making it ideal for low-latency applications.
 
diff --git a/docs/models/lfm2-700m.mdx b/lfm/models/lfm2-700m.mdx
similarity index 96%
rename from docs/models/lfm2-700m.mdx
rename to lfm/models/lfm2-700m.mdx
index 854ed38..8cdb0c2 100644
--- a/docs/models/lfm2-700m.mdx
+++ b/lfm/models/lfm2-700m.mdx
@@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx";
 import { TextVllm } from "/snippets/quickstart/text-vllm.mdx";
 import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx";
 
-<a href="/docs/models/text-models" className="back-button">← Back to Text Models</a>
+<a href="/lfm/models/text-models" className="back-button">← Back to Text Models</a>
 
 LFM2-700M is a compact model balancing capability and efficiency. Suitable for deployment on a wide range of devices including phones, tablets, and laptops with limited resources.
 
diff --git a/docs/models/lfm2-8b-a1b.mdx b/lfm/models/lfm2-8b-a1b.mdx
similarity index 96%
rename from docs/models/lfm2-8b-a1b.mdx
rename to lfm/models/lfm2-8b-a1b.mdx
index 8a40718..9dc8964 100644
--- a/docs/models/lfm2-8b-a1b.mdx
+++ b/lfm/models/lfm2-8b-a1b.mdx
@@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx";
 import { TextVllm } from "/snippets/quickstart/text-vllm.mdx";
 import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx";
 
-<a href="/docs/models/text-models" className="back-button">← Back to Text Models</a>
+<a href="/lfm/models/text-models" className="back-button">← Back to Text Models</a>
 
 LFM2-8B-A1B is Liquid AI's Mixture-of-Experts model, combining 8B total parameters with only 1.5B active parameters per forward pass. This delivers the quality of larger models with the speed and efficiency of smaller ones—ideal for on-device deployment.
 
diff --git a/docs/models/lfm2-audio-1.5b.mdx b/lfm/models/lfm2-audio-1.5b.mdx
similarity index 94%
rename from docs/models/lfm2-audio-1.5b.mdx
rename to lfm/models/lfm2-audio-1.5b.mdx
index 6cc4296..0b5a6f2 100644
--- a/docs/models/lfm2-audio-1.5b.mdx
+++ b/lfm/models/lfm2-audio-1.5b.mdx
@@ -3,10 +3,10 @@ title: "LFM2-Audio-1.5B"
 description: "1.5B audio model (deprecated - use LFM2.5-Audio-1.5B instead)"
 ---
 
-<a href="/docs/models/audio-models" className="back-button">← Back to Audio Models</a>
+<a href="/lfm/models/audio-models" className="back-button">← Back to Audio Models</a>
 
 <Warning>
-This model is deprecated. Use [LFM2.5-Audio-1.5B](/docs/models/lfm25-audio-1.5b) for improved ASR, TTS, and CPU-friendly inference.
+This model is deprecated. Use [LFM2.5-Audio-1.5B](/lfm/models/lfm25-audio-1.5b) for improved ASR, TTS, and CPU-friendly inference.
 </Warning>
 
 LFM2-Audio-1.5B was the original fully interleaved audio/text model. It has been superseded by LFM2.5-Audio-1.5B, which features a custom LFM-based audio detokenizer and improved performance.
diff --git a/docs/models/lfm2-colbert-350m.mdx b/lfm/models/lfm2-colbert-350m.mdx
similarity index 97%
rename from docs/models/lfm2-colbert-350m.mdx
rename to lfm/models/lfm2-colbert-350m.mdx
index 7150405..b691250 100644
--- a/docs/models/lfm2-colbert-350m.mdx
+++ b/lfm/models/lfm2-colbert-350m.mdx
@@ -3,7 +3,7 @@ title: "LFM2-ColBERT-350M"
 description: "350M parameter ColBERT model for multi-language document retrieval and reranking"
 ---
 
-<a href="/docs/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
+<a href="/lfm/models/liquid-nanos" className="back-button">← Back to Liquid Nanos</a>
 
 LFM2-ColBERT-350M generates dense embeddings for document retrieval and reranking using the ColBERT late-interaction architecture. It supports 8 languages and excels at semantic search tasks.
 
diff --git a/docs/models/lfm2-vl-1.6b.mdx b/lfm/models/lfm2-vl-1.6b.mdx
similarity index 91%
rename from docs/models/lfm2-vl-1.6b.mdx
rename to lfm/models/lfm2-vl-1.6b.mdx
index c617000..dc4ff52 100644
--- a/docs/models/lfm2-vl-1.6b.mdx
+++ b/lfm/models/lfm2-vl-1.6b.mdx
@@ -7,10 +7,10 @@ import { VlTransformers } from "/snippets/quickstart/vl-transformers.mdx";
 import { VlVllm } from "/snippets/quickstart/vl-vllm.mdx";
 import { VlLlamacpp } from "/snippets/quickstart/vl-llamacpp.mdx";
 
-<a href="/docs/models/vision-models" className="back-button">← Back to Vision Models</a>
+<a href="/lfm/models/vision-models" className="back-button">← Back to Vision Models</a>
 
 <Warning>
-This model is deprecated. Use [LFM2.5-VL-1.6B](/docs/models/lfm25-vl-1.6b) for improved performance.
+This model is deprecated. Use [LFM2.5-VL-1.6B](/lfm/models/lfm25-vl-1.6b) for improved performance.
 </Warning>
 
 LFM2-VL-1.6B was the original 1.6B vision-language model. It has been superseded by LFM2.5-VL-1.6B, which offers better visual understanding and reasoning through extended reinforcement learning.
diff --git a/docs/models/lfm2-vl-3b.mdx b/lfm/models/lfm2-vl-3b.mdx
similarity index 96%
rename from docs/models/lfm2-vl-3b.mdx
rename to lfm/models/lfm2-vl-3b.mdx
index 03ed3f2..a53db13 100644
--- a/docs/models/lfm2-vl-3b.mdx
+++ b/lfm/models/lfm2-vl-3b.mdx
@@ -7,7 +7,7 @@ import { VlTransformers } from "/snippets/quickstart/vl-transformers.mdx";
 import { VlVllm } from "/snippets/quickstart/vl-vllm.mdx";
 import { VlLlamacpp } from "/snippets/quickstart/vl-llamacpp.mdx";
 
-<a href="/docs/models/vision-models" className="back-button">← Back to Vision Models</a>
+<a href="/lfm/models/vision-models" className="back-button">← Back to Vision Models</a>
 
 LFM2-VL-3B is Liquid AI's highest-capacity multimodal model, delivering enhanced visual reasoning and detailed image understanding. Ideal for complex vision tasks requiring deeper comprehension.
 
diff --git a/docs/models/lfm2-vl-450m.mdx b/lfm/models/lfm2-vl-450m.mdx
similarity index 96%
rename from docs/models/lfm2-vl-450m.mdx
rename to lfm/models/lfm2-vl-450m.mdx
index 3009611..6c82036 100644
--- a/docs/models/lfm2-vl-450m.mdx
+++ b/lfm/models/lfm2-vl-450m.mdx
@@ -7,7 +7,7 @@ import { VlTransformers } from "/snippets/quickstart/vl-transformers.mdx";
 import { VlVllm } from "/snippets/quickstart/vl-vllm.mdx";
 import { VlLlamacpp } from "/snippets/quickstart/vl-llamacpp.mdx";
 
-<a href="/docs/models/vision-models" className="back-button">← Back to Vision Models</a>
+<a href="/lfm/models/vision-models" className="back-button">← Back to Vision Models</a>
 
 LFM2-VL-450M is Liquid AI's smallest vision-language model, designed for edge deployment with strict memory and compute constraints. Delivers fast multimodal inference on resource-limited devices.
 
diff --git a/docs/models/lfm25-1.2b-base.mdx b/lfm/models/lfm25-1.2b-base.mdx
similarity index 97%
rename from docs/models/lfm25-1.2b-base.mdx
rename to lfm/models/lfm25-1.2b-base.mdx
index 52bdaef..3dc3bfb 100644
--- a/docs/models/lfm25-1.2b-base.mdx
+++ b/lfm/models/lfm25-1.2b-base.mdx
@@ -3,7 +3,7 @@ title: "LFM2.5-1.2B-Base"
 description: "Pre-trained 1.2B parameter base model for fine-tuning and custom applications"
 ---
 
-<a href="/docs/models/text-models" className="back-button">← Back to Text Models</a>
+<a href="/lfm/models/text-models" className="back-button">← Back to Text Models</a>
 
 LFM2.5-1.2B-Base is the pre-trained foundation model for the LFM2.5 series. Ideal for fine-tuning on custom datasets or building specialized checkpoints. Not instruction-tuned—use LFM2.5-1.2B-Instruct for chat applications.
 
diff --git a/docs/models/lfm25-1.2b-instruct.mdx b/lfm/models/lfm25-1.2b-instruct.mdx
similarity index 96%
rename from docs/models/lfm25-1.2b-instruct.mdx
rename to lfm/models/lfm25-1.2b-instruct.mdx
index 155a8f8..d32ee03 100644
--- a/docs/models/lfm25-1.2b-instruct.mdx
+++ b/lfm/models/lfm25-1.2b-instruct.mdx
@@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx";
 import { TextVllm } from "/snippets/quickstart/text-vllm.mdx";
 import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx";
 
-<a href="/docs/models/text-models" className="back-button">← Back to Text Models</a>
+<a href="/lfm/models/text-models" className="back-button">← Back to Text Models</a>
 
 LFM2.5-1.2B-Instruct is Liquid AI's flagship instruction-tuned model, delivering exceptional performance for chat, instruction-following, and tool-calling tasks. Built on the LFM2.5 architecture with extended pre-training and reinforcement learning.
 
diff --git a/docs/models/lfm25-1.2b-jp.mdx b/lfm/models/lfm25-1.2b-jp.mdx
similarity index 96%
rename from docs/models/lfm25-1.2b-jp.mdx
rename to lfm/models/lfm25-1.2b-jp.mdx
index 4bd5fdd..734e985 100644
--- a/docs/models/lfm25-1.2b-jp.mdx
+++ b/lfm/models/lfm25-1.2b-jp.mdx
@@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx";
 import { TextVllm } from "/snippets/quickstart/text-vllm.mdx";
 import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx";
 
-<a href="/docs/models/text-models" className="back-button">← Back to Text Models</a>
+<a href="/lfm/models/text-models" className="back-button">← Back to Text Models</a>
 
 LFM2.5-1.2B-JP is fine-tuned for Japanese language tasks, delivering high-quality Japanese text generation, translation, and conversation. Built on LFM2.5 with specialized Japanese training data.
 
diff --git a/docs/models/lfm25-1.2b-thinking.mdx b/lfm/models/lfm25-1.2b-thinking.mdx
similarity index 96%
rename from docs/models/lfm25-1.2b-thinking.mdx
rename to lfm/models/lfm25-1.2b-thinking.mdx
index fa5476b..7627d90 100644
--- a/docs/models/lfm25-1.2b-thinking.mdx
+++ b/lfm/models/lfm25-1.2b-thinking.mdx
@@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx";
 import { TextVllm } from "/snippets/quickstart/text-vllm.mdx";
 import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx";
 
-<a href="/docs/models/text-models" className="back-button">← Back to Text Models</a>
+<a href="/lfm/models/text-models" className="back-button">← Back to Text Models</a>
 
 LFM2.5-1.2B-Thinking is optimized for reasoning tasks, delivering strong performance on math, logic, and multi-step problem-solving. Built on the LFM2.5 architecture with specialized training for chain-of-thought reasoning.
 
diff --git a/docs/models/lfm25-audio-1.5b.mdx b/lfm/models/lfm25-audio-1.5b.mdx
similarity index 98%
rename from docs/models/lfm25-audio-1.5b.mdx
rename to lfm/models/lfm25-audio-1.5b.mdx
index 25ad96d..3b47d04 100644
--- a/docs/models/lfm25-audio-1.5b.mdx
+++ b/lfm/models/lfm25-audio-1.5b.mdx
@@ -3,7 +3,7 @@ title: "LFM2.5-Audio-1.5B"
 description: "1.5B fully interleaved audio/text model for TTS, ASR, and voice chat"
 ---
 
-<a href="/docs/models/audio-models" className="back-button">← Back to Audio Models</a>
+<a href="/lfm/models/audio-models" className="back-button">← Back to Audio Models</a>
 
 LFM2.5-Audio-1.5B is Liquid AI's flagship audio model, featuring a custom LFM-based audio detokenizer. It delivers natural speech synthesis, multilingual speech recognition, and fully interleaved voice chat with reasoning capabilities in a single compact model.
 
diff --git a/docs/models/lfm25-vl-1.6b.mdx b/lfm/models/lfm25-vl-1.6b.mdx
similarity index 96%
rename from docs/models/lfm25-vl-1.6b.mdx
rename to lfm/models/lfm25-vl-1.6b.mdx
index aeb81dd..527151c 100644
--- a/docs/models/lfm25-vl-1.6b.mdx
+++ b/lfm/models/lfm25-vl-1.6b.mdx
@@ -7,7 +7,7 @@ import { VlTransformers } from "/snippets/quickstart/vl-transformers.mdx";
 import { VlVllm } from "/snippets/quickstart/vl-vllm.mdx";
 import { VlLlamacpp } from "/snippets/quickstart/vl-llamacpp.mdx";
 
-<a href="/docs/models/vision-models" className="back-button">← Back to Vision Models</a>
+<a href="/lfm/models/vision-models" className="back-button">← Back to Vision Models</a>
 
 LFM2.5-VL-1.6B is Liquid AI's flagship vision-language model, delivering exceptional performance on image understanding, visual reasoning, and multimodal tasks. Built on LFM2.5 with a dynamic SigLIP2 image encoder.
 
diff --git a/docs/models/liquid-nanos.mdx b/lfm/models/liquid-nanos.mdx
similarity index 79%
rename from docs/models/liquid-nanos.mdx
rename to lfm/models/liquid-nanos.mdx
index eff2033..b5dcba7 100644
--- a/docs/models/liquid-nanos.mdx
+++ b/lfm/models/liquid-nanos.mdx
@@ -32,55 +32,55 @@ icon: "sparkles"
 
 <CardGroup cols={3}>
 
-<Card title="LFM2-1.2B-Extract" href="/docs/models/lfm2-1.2b-extract">
+<Card title="LFM2-1.2B-Extract" href="/lfm/models/lfm2-1.2b-extract">
   1.2B · <Badge shape="pill" color="blue">Extraction</Badge>
 
   Extract structured JSON from unstructured documents.
 </Card>
 
-<Card title="LFM2-350M-Extract" href="/docs/models/lfm2-350m-extract">
+<Card title="LFM2-350M-Extract" href="/lfm/models/lfm2-350m-extract">
   350M · <Badge shape="pill" color="blue">Extraction</Badge>
 
   Fastest extraction model for edge deployment.
 </Card>
 
-<Card title="LFM2-350M-PII-Extract-JP" href="/docs/models/lfm2-350m-pii-extract-jp">
+<Card title="LFM2-350M-PII-Extract-JP" href="/lfm/models/lfm2-350m-pii-extract-jp">
   350M · <Badge shape="pill" color="blue">Extraction</Badge>
 
   Japanese PII detection into structured JSON.
 </Card>
 
-<Card title="LFM2-2.6B-Transcript" href="/docs/models/lfm2-2.6b-transcript">
+<Card title="LFM2-2.6B-Transcript" href="/lfm/models/lfm2-2.6b-transcript">
   2.6B · <Badge shape="pill" color="purple">Summarization</Badge>
 
   Private, on-device meeting summarization from transcripts.
 </Card>
 
-<Card title="LFM2-1.2B-RAG" href="/docs/models/lfm2-1.2b-rag">
+<Card title="LFM2-1.2B-RAG" href="/lfm/models/lfm2-1.2b-rag">
   1.2B · <Badge shape="pill" color="green">RAG</Badge>
 
   Answer questions grounded in provided context documents.
 </Card>
 
-<Card title="LFM2-ColBERT-350M" href="/docs/models/lfm2-colbert-350m">
+<Card title="LFM2-ColBERT-350M" href="/lfm/models/lfm2-colbert-350m">
   350M · <Badge shape="pill" color="orange">Retrieval</Badge>
 
   Multi-language document embeddings for retrieval and reranking.
 </Card>
 
-<Card title="LFM2-350M-ENJP-MT" href="/docs/models/lfm2-350m-enjp-mt">
+<Card title="LFM2-350M-ENJP-MT" href="/lfm/models/lfm2-350m-enjp-mt">
   350M · <Badge shape="pill" color="red">Translation</Badge>
 
   Near real-time bidirectional Japanese/English translation.
 </Card>
 
-<Card title="LFM2-350M-Math" href="/docs/models/lfm2-350m-math">
+<Card title="LFM2-350M-Math" href="/lfm/models/lfm2-350m-math">
   350M · <Badge shape="pill" color="yellow">Reasoning</Badge>
 
   Tiny reasoning model for math problem solving.
 </Card>
 
-<Card title="LFM2-1.2B-Tool" href="/docs/models/lfm2-1.2b-tool">
+<Card title="LFM2-1.2B-Tool" href="/lfm/models/lfm2-1.2b-tool">
   1.2B · <Badge shape="pill" color="gray">Deprecated</Badge>
 
   Use LFM2.5-1.2B-Instruct for tool calling instead.
diff --git a/docs/models/text-models.mdx b/lfm/models/text-models.mdx
similarity index 86%
rename from docs/models/text-models.mdx
rename to lfm/models/text-models.mdx
index 36c00f2..f65d210 100644
--- a/docs/models/text-models.mdx
+++ b/lfm/models/text-models.mdx
@@ -32,25 +32,25 @@ icon: "comment"
 
 <CardGroup cols={2}>
 
-<Card title="LFM2.5-1.2B-Instruct" href="/docs/models/lfm25-1.2b-instruct">
+<Card title="LFM2.5-1.2B-Instruct" href="/lfm/models/lfm25-1.2b-instruct">
   1.2B · <Badge shape="pill" color="green">Recommended</Badge>
 
   Instruction-tuned for chat. Best for most use cases.
 </Card>
 
-<Card title="LFM2.5-1.2B-Thinking" href="/docs/models/lfm25-1.2b-thinking">
+<Card title="LFM2.5-1.2B-Thinking" href="/lfm/models/lfm25-1.2b-thinking">
   1.2B · <Badge shape="pill" color="purple">Reasoning</Badge>
 
   Optimized for math and logical problem-solving.
 </Card>
 
-<Card title="LFM2.5-1.2B-Base" href="/docs/models/lfm25-1.2b-base">
+<Card title="LFM2.5-1.2B-Base" href="/lfm/models/lfm25-1.2b-base">
   1.2B · <Badge shape="pill" color="orange">Pre-trained</Badge>
 
   Base model for finetuning or custom checkpoints.
 </Card>
 
-<Card title="LFM2.5-1.2B-JP" href="/docs/models/lfm25-1.2b-jp">
+<Card title="LFM2.5-1.2B-JP" href="/lfm/models/lfm25-1.2b-jp">
   1.2B · <Badge shape="pill" color="red">Japanese</Badge>
 
   Fine-tuned model for high-quality Japanese text generation.
@@ -62,37 +62,37 @@ icon: "comment"
 
 <CardGroup cols={3}>
 
-<Card title="LFM2-8B-A1B" href="/docs/models/lfm2-8b-a1b">
+<Card title="LFM2-8B-A1B" href="/lfm/models/lfm2-8b-a1b">
   8B · 1.5B active · <Badge shape="pill" color="blue">MoE</Badge>
 
   Mixture-of-experts model for on-device speed and quality.
 </Card>
 
-<Card title="LFM2-2.6B" href="/docs/models/lfm2-2.6b">
+<Card title="LFM2-2.6B" href="/lfm/models/lfm2-2.6b">
   2.6B
 
   Highly capable model for deployment on most phones and laptops.
 </Card>
 
-<Card title="LFM2-2.6B-Exp" href="/docs/models/lfm2-2.6b-exp">
+<Card title="LFM2-2.6B-Exp" href="/lfm/models/lfm2-2.6b-exp">
   2.6B
 
   RL-only post-trained checkpoint for improved math and reasoning.
 </Card>
 
-<Card title="LFM2-1.2B" href="/docs/models/lfm2-1.2b">
+<Card title="LFM2-1.2B" href="/lfm/models/lfm2-1.2b">
   1.2B · <Badge shape="pill" color="gray">Deprecated</Badge>
 
   Use the new LFM2.5-1.2B-Instruct checkpoint instead.
 </Card>
 
-<Card title="LFM2-700M" href="/docs/models/lfm2-700m">
+<Card title="LFM2-700M" href="/lfm/models/lfm2-700m">
   700M
 
   Mid sized model for deploying on most devices.
 </Card>
 
-<Card title="LFM2-350M" href="/docs/models/lfm2-350m">
+<Card title="LFM2-350M" href="/lfm/models/lfm2-350m">
   350M · <Badge shape="pill" color="yellow">Fastest</Badge>
 
   Our smallest model for edge devices and low latency deployments.
diff --git a/docs/models/vision-models.mdx b/lfm/models/vision-models.mdx
similarity index 93%
rename from docs/models/vision-models.mdx
rename to lfm/models/vision-models.mdx
index f9bec43..1e92172 100644
--- a/docs/models/vision-models.mdx
+++ b/lfm/models/vision-models.mdx
@@ -32,7 +32,7 @@ icon: "eye"
 
 <CardGroup cols={2}>
 
-<Card title="LFM2.5-VL-1.6B" href="/docs/models/lfm25-vl-1.6b">
+<Card title="LFM2.5-VL-1.6B" href="/lfm/models/lfm25-vl-1.6b">
   1.6B · <Badge shape="pill" color="green">Recommended</Badge>
 
   Best vision model for most use cases. Fast and accurate.
@@ -44,19 +44,19 @@ icon: "eye"
 
 <CardGroup cols={3}>
 
-<Card title="LFM2-VL-3B" href="/docs/models/lfm2-vl-3b">
+<Card title="LFM2-VL-3B" href="/lfm/models/lfm2-vl-3b">
   3B
 
   Highest-capacity multimodal model with enhanced visual reasoning.
 </Card>
 
-<Card title="LFM2-VL-1.6B" href="/docs/models/lfm2-vl-1.6b">
+<Card title="LFM2-VL-1.6B" href="/lfm/models/lfm2-vl-1.6b">
   1.6B · <Badge shape="pill" color="gray">Deprecated</Badge>
 
   Use the new LFM2.5-VL-1.6B checkpoint instead.
 </Card>
 
-<Card title="LFM2-VL-450M" href="/docs/models/lfm2-vl-450m">
+<Card title="LFM2-VL-450M" href="/lfm/models/lfm2-vl-450m">
   450M · <Badge shape="pill" color="yellow">Fastest</Badge>
 
   Compact multimodal model for edge deployment and fast inference.
diff --git a/docs/getting-started/connect-ai-tools.mdx b/snippets/connect-ai-tools.mdx
similarity index 89%
rename from docs/getting-started/connect-ai-tools.mdx
rename to snippets/connect-ai-tools.mdx
index 44a6fe0..7f80b29 100644
--- a/docs/getting-started/connect-ai-tools.mdx
+++ b/snippets/connect-ai-tools.mdx
@@ -1,8 +1,3 @@
----
-title: "Connect AI Tools"
-description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation"
----
-
 ## What is MCP?
 
 The Model Context Protocol (MCP) is an open standard that gives AI applications a standardized way to connect to external data sources and tools. By connecting your AI coding tool to Liquid docs via MCP, you're giving it live, queryable access to the complete documentation: not a snapshot, not a cached file, but a real-time search against our official documentation.
@@ -111,10 +106,10 @@ You're all set! Cursor now has real-time access to Liquid AI documentation.
 ## Next Steps
 
 <CardGroup cols={2}>
-  <Card title="Explore Models" icon="brain" href="/docs/models/complete-library">
+  <Card title="Explore Models" icon="brain" href="/lfm/models/complete-library">
     Browse our collection of language models
   </Card>
-  <Card title="Quick Start" icon="rocket" href="/leap/edge-sdk/overview">
+  <Card title="Quick Start" icon="rocket" href="/deployment/on-device/ios/ios-quick-start-guide">
     Get started with the LEAP SDK
   </Card>
 </CardGroup>