diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index f0c8e0a..20981d7 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -2,19 +2,21 @@ * @Paulescu # Domain owners -/docs/fine-tuning/ @Liquid4All/fine-tuning-team +/customization/ @Liquid4All/fine-tuning-team -/docs/inference @Liquid4All/inference-team -/docs/inference/*-deployment.mdx @tuliren +/deployment/gpu-inference/ @Liquid4All/inference-team +/deployment/gpu-inference/baseten.mdx @tuliren +/deployment/gpu-inference/fal.mdx @tuliren +/deployment/gpu-inference/modal.mdx @tuliren +/deployment/on-device/ @Liquid4All/inference-team -/docs/key-concepts/ @mlabonne -/docs/models/audio-models.mdx @haerski -/docs/models/vision-models.mdx @ankke -/docs/models/ @mlabonne +/lfm/key-concepts/ @mlabonne +/lfm/models/audio-models.mdx @haerski +/lfm/models/vision-models.mdx @ankke +/lfm/models/ @mlabonne -/leap/ @dbhathena -/leap/edge-sdk/ @iamstuffed -/leap/leap-bundle/ @tuliren -/leap/finetuning.mdx @Liquid4All/fine-tuning-team +/deployment/on-device/ios/ @iamstuffed +/deployment/on-device/android/ @iamstuffed +/deployment/tools/model-bundling/ @tuliren /.github/workflows/ @tuliren diff --git a/docs/fine-tuning/datasets.mdx b/customization/finetuning-frameworks/datasets.mdx similarity index 100% rename from docs/fine-tuning/datasets.mdx rename to customization/finetuning-frameworks/datasets.mdx diff --git a/docs/fine-tuning/leap-finetune.mdx b/customization/finetuning-frameworks/leap-finetune.mdx similarity index 79% rename from docs/fine-tuning/leap-finetune.mdx rename to customization/finetuning-frameworks/leap-finetune.mdx index d25add6..03bcfb3 100644 --- a/docs/fine-tuning/leap-finetune.mdx +++ b/customization/finetuning-frameworks/leap-finetune.mdx @@ -20,10 +20,10 @@ LEAP Finetune will provide: While LEAP Finetune is in development, you can fine-tune models using: - + Hugging Face's training library with LoRA/QLoRA support - + Memory-efficient fine-tuning with 2x faster training @@ -33,8 +33,8 @@ While LEAP Finetune is in development, you can fine-tune models using: After fine-tuning with TRL or Unsloth, prepare your model for edge deployment: 1. **Fine-tune** your model using TRL or Unsloth -2. **Convert** to edge-optimized format using the [Model Bundling Service](/leap/leap-bundle/quick-start) -3. **Deploy** to mobile devices using the [LEAP SDK](/leap/edge-sdk/overview) +2. **Convert** to edge-optimized format using the [Model Bundling Service](/deployment/tools/model-bundling/quick-start) +3. **Deploy** to mobile devices using the [LEAP SDK](/deployment/on-device/ios/ios-quick-start-guide) ```bash # Example: Bundle a fine-tuned model for edge deployment diff --git a/docs/fine-tuning/trl.mdx b/customization/finetuning-frameworks/trl.mdx similarity index 96% rename from docs/fine-tuning/trl.mdx rename to customization/finetuning-frameworks/trl.mdx index c63ac28..00cbea2 100644 --- a/docs/fine-tuning/trl.mdx +++ b/customization/finetuning-frameworks/trl.mdx @@ -9,7 +9,7 @@ description: "TRL (Transformer Reinforcement Learning) is a library for fine-tun LFM models work out-of-the-box with TRL without requiring any custom integration. -Different training methods require specific dataset formats. See [Finetuning Datasets](/docs/fine-tuning/datasets) for format requirements. +Different training methods require specific dataset formats. See [Finetuning Datasets](/customization/finetuning-frameworks/datasets) for format requirements. ## Installation[​](#installation "Direct link to Installation") @@ -27,7 +27,7 @@ pip install trl>=0.9.0 transformers>=4.55.0 torch>=2.6 peft accelerate [![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/πŸ’§_LFM2_5_SFT_with_TRL.ipynb) -The `SFTTrainer` makes it easy to fine-tune LFM models on instruction-following or conversational datasets. It handles chat templates, packing, and dataset formatting automatically. SFT training requires [Instruction datasets](/docs/fine-tuning/datasets#instruction-datasets-sft). +The `SFTTrainer` makes it easy to fine-tune LFM models on instruction-following or conversational datasets. It handles chat templates, packing, and dataset formatting automatically. SFT training requires [Instruction datasets](/customization/finetuning-frameworks/datasets#instruction-datasets-sft). ### LoRA Fine-Tuning (Recommended)[​](#lora-fine-tuning-recommended "Direct link to LoRA Fine-Tuning (Recommended)") @@ -132,7 +132,7 @@ trainer.train() [![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/πŸ’§_LFM2_5_VL_SFT_with_TRL.ipynb) -The `SFTTrainer` also supports fine-tuning Vision Language Models like `LFM2.5-VL-1.6B` on image-text datasets. VLM fine-tuning requires [Vision datasets](/docs/fine-tuning/datasets#vision-datasets-vlm-sft) and a few key differences from text-only SFT: +The `SFTTrainer` also supports fine-tuning Vision Language Models like `LFM2.5-VL-1.6B` on image-text datasets. VLM fine-tuning requires [Vision datasets](/customization/finetuning-frameworks/datasets#vision-datasets-vlm-sft) and a few key differences from text-only SFT: * Uses `AutoModelForImageTextToText` instead of `AutoModelForCausalLM` * Uses `AutoProcessor` instead of just a tokenizer @@ -290,7 +290,7 @@ trainer.train() [![Colab link](/images/lfm/fine-tuning/production/uploads/61b8e2ba285851687028d395/vlOyMEjwHa_b_LXysEu2E.png)](https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/πŸ’§_LFM2_DPO_with_TRL.ipynb) -The `DPOTrainer` implements Direct Preference Optimization, a method to align models with human preferences without requiring a separate reward model. DPO training requires [Preference datasets](/docs/fine-tuning/datasets#preference-datasets-dpo) with chosen and rejected response pairs. +The `DPOTrainer` implements Direct Preference Optimization, a method to align models with human preferences without requiring a separate reward model. DPO training requires [Preference datasets](/customization/finetuning-frameworks/datasets#preference-datasets-dpo) with chosen and rejected response pairs. ### DPO with LoRA (Recommended)[​](#dpo-with-lora-recommended "Direct link to DPO with LoRA (Recommended)") diff --git a/docs/fine-tuning/unsloth.mdx b/customization/finetuning-frameworks/unsloth.mdx similarity index 88% rename from docs/fine-tuning/unsloth.mdx rename to customization/finetuning-frameworks/unsloth.mdx index cdf1548..32568f8 100644 --- a/docs/fine-tuning/unsloth.mdx +++ b/customization/finetuning-frameworks/unsloth.mdx @@ -7,9 +7,9 @@ description: "Unsloth makes fine-tuning LLMs 2-5x faster with 70% less memory th Use Unsloth for faster training with optimized kernels, reduced memory usage, and built-in quantization support. -LFM2.5 models are fully supported by Unsloth. For comprehensive guides and tutorials, see the [official Unsloth LFM2.5 documentation](https://unsloth.ai/docs/models/tutorials/lfm2.5). +LFM2.5 models are fully supported by Unsloth. For comprehensive guides and tutorials, see the [official Unsloth LFM2.5 documentation](https://unsloth.ai/lfm/models/tutorials/lfm2.5). -Different training methods require specific dataset formats. See [Finetuning Datasets](/docs/fine-tuning/datasets) for format requirements for [SFT](/docs/fine-tuning/datasets#instruction-datasets-sft) and [GRPO](/docs/fine-tuning/datasets#prompt-only-datasets-grpo). +Different training methods require specific dataset formats. See [Finetuning Datasets](/customization/finetuning-frameworks/datasets) for format requirements for [SFT](/customization/finetuning-frameworks/datasets#instruction-datasets-sft) and [GRPO](/customization/finetuning-frameworks/datasets#prompt-only-datasets-grpo). ## Notebooks @@ -85,5 +85,5 @@ FastLanguageModel.for_inference(model) ## Resources * [Unsloth Documentation](https://unsloth.ai/docs) -* [Unsloth LFM2.5 Tutorial](https://unsloth.ai/docs/models/tutorials/lfm2.5) +* [Unsloth LFM2.5 Tutorial](https://unsloth.ai/lfm/models/tutorials/lfm2.5) * [Liquid AI Cookbook](https://github.com/Liquid4All/cookbook) diff --git a/customization/getting-started/connect-ai-tools.mdx b/customization/getting-started/connect-ai-tools.mdx new file mode 100644 index 0000000..69ee17b --- /dev/null +++ b/customization/getting-started/connect-ai-tools.mdx @@ -0,0 +1,8 @@ +--- +title: "Connect AI Tools" +description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation" +--- + +import ConnectAiTools from "/snippets/connect-ai-tools.mdx"; + + diff --git a/customization/getting-started/welcome.mdx b/customization/getting-started/welcome.mdx new file mode 100644 index 0000000..4450634 --- /dev/null +++ b/customization/getting-started/welcome.mdx @@ -0,0 +1,23 @@ +--- +title: "Customization Options" +description: "Fine-tune and customize Liquid Foundation Models for your specific use cases." +--- + +LFM models support fine-tuning with popular frameworks and tools. Whether you need to adapt models for domain-specific tasks, improve accuracy on your data, or optimize for production workflows, these guides will help you get started. + +## Get Started + + + + Evaluate and iterate on prompts with Liquid's no-code Workbench tool + + + Prepare datasets in the right format for SFT, DPO, and GRPO training + + + Fine-tune LFM models using Hugging Face's TRL library + + + Fast and memory-efficient fine-tuning with Unsloth + + diff --git a/docs/fine-tuning/workbench.mdx b/customization/tools/workbench.mdx similarity index 100% rename from docs/fine-tuning/workbench.mdx rename to customization/tools/workbench.mdx diff --git a/deployment/getting-started/connect-ai-tools.mdx b/deployment/getting-started/connect-ai-tools.mdx new file mode 100644 index 0000000..69ee17b --- /dev/null +++ b/deployment/getting-started/connect-ai-tools.mdx @@ -0,0 +1,8 @@ +--- +title: "Connect AI Tools" +description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation" +--- + +import ConnectAiTools from "/snippets/connect-ai-tools.mdx"; + + diff --git a/deployment/getting-started/welcome.mdx b/deployment/getting-started/welcome.mdx new file mode 100644 index 0000000..8ea7c67 --- /dev/null +++ b/deployment/getting-started/welcome.mdx @@ -0,0 +1,60 @@ +--- +title: "Deployment Options" +description: "Deploy Liquid Foundation Models on any platform β€” from mobile devices to GPU clusters." +--- + +LFM models are designed for efficient deployment across a wide range of platforms. Run models on-device for privacy and low latency, or scale up with GPU inference for production workloads. + +## On-Device + + + + Deploy models natively on iPhone and iPad + + + Deploy models natively on Android devices + + + CPU-first inference with cross-platform support + + + Optimized inference on Apple Silicon + + + Cross-platform inference with ONNX Runtime + + + Easy local deployment and model management + + + +## GPU Inference + + + + Flexible inference with Hugging Face Transformers + + + High-throughput production serving + + + Structured generation and fast serving + + + Serverless GPU deployment + + + Production model inference platform + + + Fast inference API platform + + + +## Tools + + + + Package and distribute optimized model bundles for edge deployment + + diff --git a/docs/inference/baseten-deployment.mdx b/deployment/gpu-inference/baseten.mdx similarity index 100% rename from docs/inference/baseten-deployment.mdx rename to deployment/gpu-inference/baseten.mdx diff --git a/docs/inference/fal-deployment.mdx b/deployment/gpu-inference/fal.mdx similarity index 100% rename from docs/inference/fal-deployment.mdx rename to deployment/gpu-inference/fal.mdx diff --git a/docs/inference/modal-deployment.mdx b/deployment/gpu-inference/modal.mdx similarity index 100% rename from docs/inference/modal-deployment.mdx rename to deployment/gpu-inference/modal.mdx diff --git a/docs/inference/sglang.mdx b/deployment/gpu-inference/sglang.mdx similarity index 95% rename from docs/inference/sglang.mdx rename to deployment/gpu-inference/sglang.mdx index 78e8d04..392eadf 100644 --- a/docs/inference/sglang.mdx +++ b/deployment/gpu-inference/sglang.mdx @@ -7,7 +7,7 @@ description: "SGLang is a fast serving framework for large language models. It f Use SGLang for ultra-low latency, high-throughput production serving with many concurrent requests. -SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/docs/inference/llama-cpp) instead. +SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/deployment/on-device/llama-cpp) instead. ## Supported Models @@ -18,7 +18,7 @@ SGLang requires a CUDA-compatible GPU. For CPU-only environments, consider using | Vision models | Not yet supported | LFM2-VL | -MoE model support has been merged into SGLang but is not yet included in a stable release β€” [install from main](#install-from-main-moe-support) to use MoE models now. Vision models are not yet supported in SGLang β€” use [Transformers](/docs/inference/transformers) for vision workloads. +MoE model support has been merged into SGLang but is not yet included in a stable release β€” [install from main](#install-from-main-moe-support) to use MoE models now. Vision models are not yet supported in SGLang β€” use [Transformers](/deployment/gpu-inference/transformers) for vision workloads. ## Installation @@ -119,7 +119,7 @@ response = client.chat.completions.create( print(response.choices[0].message) ``` -For more details on tool use with LFM models, see [Tool Use](/docs/key-concepts/tool-use). +For more details on tool use with LFM models, see [Tool Use](/lfm/key-concepts/tool-use). ```bash diff --git a/docs/inference/transformers.mdx b/deployment/gpu-inference/transformers.mdx similarity index 99% rename from docs/inference/transformers.mdx rename to deployment/gpu-inference/transformers.mdx index bfffc43..7c4f6b3 100644 --- a/docs/inference/transformers.mdx +++ b/deployment/gpu-inference/transformers.mdx @@ -7,7 +7,7 @@ description: "Transformers is a library for inference and training of pretrained Use Transformers for simple inference without extra dependencies, research and experimentation, or integration with the Hugging Face ecosystem. -Transformers provides the most flexibility for model development and is ideal for users who want direct access to model internals. For production deployments with high throughput, consider using [vLLM](/docs/inference/vllm). +Transformers provides the most flexibility for model development and is ideal for users who want direct access to model internals. For production deployments with high throughput, consider using [vLLM](/deployment/gpu-inference/vllm).
@@ -165,7 +165,7 @@ output = model.generate(input_ids, streamer=streamer, max_new_tokens=512) Process multiple prompts in a single batch for efficiency. See the [batching documentation](https://huggingface.co/docs/transformers/en/main_classes/text_generation#batch-generation) for more details: - Batching is not automatically a win for performance. For high-performance batching with optimized throughput, consider using [vLLM](/docs/inference/vllm). + Batching is not automatically a win for performance. For high-performance batching with optimized throughput, consider using [vLLM](/deployment/gpu-inference/vllm). ```python diff --git a/docs/inference/vllm.mdx b/deployment/gpu-inference/vllm.mdx similarity index 97% rename from docs/inference/vllm.mdx rename to deployment/gpu-inference/vllm.mdx index aab625d..1d5ff84 100644 --- a/docs/inference/vllm.mdx +++ b/deployment/gpu-inference/vllm.mdx @@ -7,7 +7,7 @@ description: "vLLM is a high-throughput and memory-efficient inference engine fo Use vLLM for high-throughput production deployments, batch processing, or serving models via an API. -vLLM offers significantly higher throughput than [Transformers](/docs/inference/transformers), making it ideal for serving many concurrent requests. However, it requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/docs/inference/llama-cpp) instead. +vLLM offers significantly higher throughput than [Transformers](/deployment/gpu-inference/transformers), making it ideal for serving many concurrent requests. However, it requires a CUDA-compatible GPU. For CPU-only environments, consider using [llama.cpp](/deployment/on-device/llama-cpp) instead.
diff --git a/leap/edge-sdk/android/advanced-features.mdx b/deployment/on-device/android/advanced-features.mdx similarity index 100% rename from leap/edge-sdk/android/advanced-features.mdx rename to deployment/on-device/android/advanced-features.mdx diff --git a/leap/edge-sdk/android/ai-agent-usage-guide.mdx b/deployment/on-device/android/ai-agent-usage-guide.mdx similarity index 100% rename from leap/edge-sdk/android/ai-agent-usage-guide.mdx rename to deployment/on-device/android/ai-agent-usage-guide.mdx diff --git a/leap/edge-sdk/android/android-quick-start-guide.mdx b/deployment/on-device/android/android-quick-start-guide.mdx similarity index 99% rename from leap/edge-sdk/android/android-quick-start-guide.mdx rename to deployment/on-device/android/android-quick-start-guide.mdx index 686c218..36a9b1d 100644 --- a/leap/edge-sdk/android/android-quick-start-guide.mdx +++ b/deployment/on-device/android/android-quick-start-guide.mdx @@ -450,4 +450,4 @@ In this pattern: See [LeapSDK-Examples](https://github.com/Liquid4All/LeapSDK-Examples) for complete example apps using LeapSDK. -[Edit this page](https://github.com/Liquid4All/docs/tree/main/leap/edge-sdk/android/android-quick-start-guide.mdx) +[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/android/android-quick-start-guide.mdx) diff --git a/leap/edge-sdk/android/cloud-ai-comparison.mdx b/deployment/on-device/android/cloud-ai-comparison.mdx similarity index 100% rename from leap/edge-sdk/android/cloud-ai-comparison.mdx rename to deployment/on-device/android/cloud-ai-comparison.mdx diff --git a/leap/edge-sdk/android/constrained-generation.mdx b/deployment/on-device/android/constrained-generation.mdx similarity index 100% rename from leap/edge-sdk/android/constrained-generation.mdx rename to deployment/on-device/android/constrained-generation.mdx diff --git a/leap/edge-sdk/android/conversation-generation.mdx b/deployment/on-device/android/conversation-generation.mdx similarity index 100% rename from leap/edge-sdk/android/conversation-generation.mdx rename to deployment/on-device/android/conversation-generation.mdx diff --git a/leap/edge-sdk/android/function-calling.mdx b/deployment/on-device/android/function-calling.mdx similarity index 100% rename from leap/edge-sdk/android/function-calling.mdx rename to deployment/on-device/android/function-calling.mdx diff --git a/leap/edge-sdk/android/messages-content.mdx b/deployment/on-device/android/messages-content.mdx similarity index 100% rename from leap/edge-sdk/android/messages-content.mdx rename to deployment/on-device/android/messages-content.mdx diff --git a/leap/edge-sdk/android/model-loading.mdx b/deployment/on-device/android/model-loading.mdx similarity index 100% rename from leap/edge-sdk/android/model-loading.mdx rename to deployment/on-device/android/model-loading.mdx diff --git a/leap/edge-sdk/android/utilities.mdx b/deployment/on-device/android/utilities.mdx similarity index 100% rename from leap/edge-sdk/android/utilities.mdx rename to deployment/on-device/android/utilities.mdx diff --git a/leap/edge-sdk/ios/advanced-features.mdx b/deployment/on-device/ios/advanced-features.mdx similarity index 100% rename from leap/edge-sdk/ios/advanced-features.mdx rename to deployment/on-device/ios/advanced-features.mdx diff --git a/leap/edge-sdk/ios/ai-agent-usage-guide.mdx b/deployment/on-device/ios/ai-agent-usage-guide.mdx similarity index 100% rename from leap/edge-sdk/ios/ai-agent-usage-guide.mdx rename to deployment/on-device/ios/ai-agent-usage-guide.mdx diff --git a/leap/edge-sdk/ios/cloud-ai-comparison.mdx b/deployment/on-device/ios/cloud-ai-comparison.mdx similarity index 100% rename from leap/edge-sdk/ios/cloud-ai-comparison.mdx rename to deployment/on-device/ios/cloud-ai-comparison.mdx diff --git a/leap/edge-sdk/ios/constrained-generation.mdx b/deployment/on-device/ios/constrained-generation.mdx similarity index 100% rename from leap/edge-sdk/ios/constrained-generation.mdx rename to deployment/on-device/ios/constrained-generation.mdx diff --git a/leap/edge-sdk/ios/conversation-generation.mdx b/deployment/on-device/ios/conversation-generation.mdx similarity index 100% rename from leap/edge-sdk/ios/conversation-generation.mdx rename to deployment/on-device/ios/conversation-generation.mdx diff --git a/leap/edge-sdk/ios/function-calling.mdx b/deployment/on-device/ios/function-calling.mdx similarity index 100% rename from leap/edge-sdk/ios/function-calling.mdx rename to deployment/on-device/ios/function-calling.mdx diff --git a/leap/edge-sdk/ios/ios-quick-start-guide.mdx b/deployment/on-device/ios/ios-quick-start-guide.mdx similarity index 97% rename from leap/edge-sdk/ios/ios-quick-start-guide.mdx rename to deployment/on-device/ios/ios-quick-start-guide.mdx index a580628..d7d9a28 100644 --- a/leap/edge-sdk/ios/ios-quick-start-guide.mdx +++ b/deployment/on-device/ios/ios-quick-start-guide.mdx @@ -323,10 +323,10 @@ conversation = current.modelRunner.createConversationFromHistory( ## Next steps[​](#next-steps "Direct link to Next steps") -* Learn how to expose structured JSON outputs with the [`@Generatable` macros](/leap/edge-sdk/ios/constrained-generation). -* Wire up tools and external APIs with [Function Calling](/leap/edge-sdk/ios/function-calling). -* Compare on-device and cloud behaviour in [Cloud AI Comparison](/leap/edge-sdk/ios/cloud-ai-comparison). +* Learn how to expose structured JSON outputs with the [`@Generatable` macros](/deployment/on-device/ios/constrained-generation). +* Wire up tools and external APIs with [Function Calling](/deployment/on-device/ios/function-calling). +* Compare on-device and cloud behaviour in [Cloud AI Comparison](/deployment/on-device/ios/cloud-ai-comparison). You now have a project that loads an on-device model, streams responses, and is ready for advanced features like structured output and tool use. -[Edit this page](https://github.com/Liquid4All/docs/tree/main/leap/edge-sdk/ios/ios-quick-start-guide.mdx) +[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/ios/ios-quick-start-guide.mdx) diff --git a/leap/edge-sdk/ios/messages-content.mdx b/deployment/on-device/ios/messages-content.mdx similarity index 100% rename from leap/edge-sdk/ios/messages-content.mdx rename to deployment/on-device/ios/messages-content.mdx diff --git a/leap/edge-sdk/ios/model-loading.mdx b/deployment/on-device/ios/model-loading.mdx similarity index 100% rename from leap/edge-sdk/ios/model-loading.mdx rename to deployment/on-device/ios/model-loading.mdx diff --git a/leap/edge-sdk/ios/utilities.mdx b/deployment/on-device/ios/utilities.mdx similarity index 100% rename from leap/edge-sdk/ios/utilities.mdx rename to deployment/on-device/ios/utilities.mdx diff --git a/docs/inference/llama-cpp.mdx b/deployment/on-device/llama-cpp.mdx similarity index 99% rename from docs/inference/llama-cpp.mdx rename to deployment/on-device/llama-cpp.mdx index 4c240eb..6612222 100644 --- a/docs/inference/llama-cpp.mdx +++ b/deployment/on-device/llama-cpp.mdx @@ -7,7 +7,7 @@ description: "llama.cpp is a C++ library for efficient LLM inference with minima Use llama.cpp for CPU-only environments, local development, or edge deployment and on-device inference. -For GPU-accelerated inference at scale, consider using [vLLM](/docs/inference/vllm) instead. +For GPU-accelerated inference at scale, consider using [vLLM](/deployment/gpu-inference/vllm) instead.
@@ -100,7 +100,7 @@ For GPU-accelerated inference at scale, consider using [vLLM](/docs/inference/vl ## Downloading GGUF Models -llama.cpp uses the GGUF format, which stores quantized model weights for efficient inference. All LFM models are available in GGUF format on Hugging Face. See the [Models page](/docs/models/complete-library) for all available GGUF models. +llama.cpp uses the GGUF format, which stores quantized model weights for efficient inference. All LFM models are available in GGUF format on Hugging Face. See the [Models page](/lfm/models/complete-library) for all available GGUF models. You can download LFM models in GGUF format from Hugging Face as follows: diff --git a/docs/inference/lm-studio.mdx b/deployment/on-device/lm-studio.mdx similarity index 98% rename from docs/inference/lm-studio.mdx rename to deployment/on-device/lm-studio.mdx index 19ea48a..e961050 100644 --- a/docs/inference/lm-studio.mdx +++ b/deployment/on-device/lm-studio.mdx @@ -18,7 +18,7 @@ Download and install LM Studio directly from [lmstudio.ai](https://lmstudio.ai/d 3. Select a model and quantization level (`Q4_K_M` recommended) 4. Click **Download** -See the [Models page](/docs/models/complete-library) for all available GGUF models. +See the [Models page](/lfm/models/complete-library) for all available GGUF models. ## Using the Chat Interface diff --git a/docs/inference/mlx.mdx b/deployment/on-device/mlx.mdx similarity index 95% rename from docs/inference/mlx.mdx rename to deployment/on-device/mlx.mdx index 199f001..35b0bfa 100644 --- a/docs/inference/mlx.mdx +++ b/deployment/on-device/mlx.mdx @@ -21,7 +21,7 @@ pip install mlx-lm The `mlx-lm` package provides a simple interface for text generation with MLX models. -See the [Models page](/docs/models/complete-library) for all available MLX models, or browse MLX community models at [mlx-community LFM2 models](https://huggingface.co/models?sort=created&search=mlx-communityLFM2). +See the [Models page](/lfm/models/complete-library) for all available MLX models, or browse MLX community models at [mlx-community LFM2 models](https://huggingface.co/models?sort=created&search=mlx-communityLFM2). ```python from mlx_lm import load, generate diff --git a/docs/inference/ollama.mdx b/deployment/on-device/ollama.mdx similarity index 98% rename from docs/inference/ollama.mdx rename to deployment/on-device/ollama.mdx index a149954..65939f7 100644 --- a/docs/inference/ollama.mdx +++ b/deployment/on-device/ollama.mdx @@ -68,7 +68,7 @@ You can run LFM2 models directly from Hugging Face: ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF ``` -See the [Models page](/docs/models/complete-library) for all available GGUF repositories. +See the [Models page](/lfm/models/complete-library) for all available GGUF repositories. To use a local GGUF file, first download a model from Hugging Face: diff --git a/docs/inference/onnx.mdx b/deployment/on-device/onnx.mdx similarity index 98% rename from docs/inference/onnx.mdx rename to deployment/on-device/onnx.mdx index 7d363a9..9cf5cf7 100644 --- a/docs/inference/onnx.mdx +++ b/deployment/on-device/onnx.mdx @@ -72,7 +72,7 @@ For complete documentation and advanced options, see the [LiquidONNX GitHub repo ## Pre-exported Models -Many LFM models are available as pre-exported ONNX packages from [LiquidAI](https://huggingface.co/LiquidAI/models?search=onnx) and the [onnx-community](https://huggingface.co/onnx-community). Check the [Model Library](/docs/models/complete-library) for a complete list of available formats. +Many LFM models are available as pre-exported ONNX packages from [LiquidAI](https://huggingface.co/LiquidAI/models?search=onnx) and the [onnx-community](https://huggingface.co/onnx-community). Check the [Model Library](/lfm/models/complete-library) for a complete list of available formats. ### Quantization Options diff --git a/leap/leap-bundle/authentication.mdx b/deployment/tools/model-bundling/authentication.mdx similarity index 100% rename from leap/leap-bundle/authentication.mdx rename to deployment/tools/model-bundling/authentication.mdx diff --git a/leap/leap-bundle/bundle-creation.mdx b/deployment/tools/model-bundling/bundle-creation.mdx similarity index 100% rename from leap/leap-bundle/bundle-creation.mdx rename to deployment/tools/model-bundling/bundle-creation.mdx diff --git a/leap/leap-bundle/bundle-management.mdx b/deployment/tools/model-bundling/bundle-management.mdx similarity index 100% rename from leap/leap-bundle/bundle-management.mdx rename to deployment/tools/model-bundling/bundle-management.mdx diff --git a/leap/leap-bundle/changelog.mdx b/deployment/tools/model-bundling/changelog.mdx similarity index 100% rename from leap/leap-bundle/changelog.mdx rename to deployment/tools/model-bundling/changelog.mdx diff --git a/leap/leap-bundle/configuration.mdx b/deployment/tools/model-bundling/configuration.mdx similarity index 100% rename from leap/leap-bundle/configuration.mdx rename to deployment/tools/model-bundling/configuration.mdx diff --git a/leap/leap-bundle/data-privacy.mdx b/deployment/tools/model-bundling/data-privacy.mdx similarity index 100% rename from leap/leap-bundle/data-privacy.mdx rename to deployment/tools/model-bundling/data-privacy.mdx diff --git a/leap/leap-bundle/download.mdx b/deployment/tools/model-bundling/download.mdx similarity index 100% rename from leap/leap-bundle/download.mdx rename to deployment/tools/model-bundling/download.mdx diff --git a/leap/leap-bundle/quick-start.mdx b/deployment/tools/model-bundling/quick-start.mdx similarity index 96% rename from leap/leap-bundle/quick-start.mdx rename to deployment/tools/model-bundling/quick-start.mdx index f9a71a4..4710156 100644 --- a/leap/leap-bundle/quick-start.mdx +++ b/deployment/tools/model-bundling/quick-start.mdx @@ -69,7 +69,7 @@ If model uploads fail with connectivity errors, verify that your network allows 3. Select the [`API keys` tab](https://leap.liquid.ai/profile#/api-keys) and create a new API key - ![api-key-screenshot](/images/leap/leap-bundle/assets/images/api-keys-51242efd637d71dd5e7f4eb01555cd78.png) + ![api-key-screenshot](/images/deployment/tools/model-bundling/assets/images/api-keys-51242efd637d71dd5e7f4eb01555cd78.png) 4. Authenticate the Model Bundling Service with your API token: @@ -228,4 +228,4 @@ If model uploads fail with connectivity errors, verify that your network allows ## Next Steps * Visit the [LEAP Model Library](https://leap.liquid.ai/models) to explore available models. -* Check the [Bundle Creation](/leap/leap-bundle/bundle-creation) page for detailed command reference. +* Check the [Bundle Creation](/deployment/tools/model-bundling/bundle-creation) page for detailed command reference. diff --git a/leap/leap-bundle/reference.mdx b/deployment/tools/model-bundling/reference.mdx similarity index 100% rename from leap/leap-bundle/reference.mdx rename to deployment/tools/model-bundling/reference.mdx diff --git a/docs.json b/docs.json index 67a08f3..71c6ad4 100644 --- a/docs.json +++ b/docs.json @@ -1,7 +1,7 @@ { "$schema": "https://mintlify.com/docs.json", "banner": { - "content": "πŸš€ New: LFM2.5-1.2B-Instruct and LFM2.5-1.2B-Thinking are now available! [Learn more β†’](/docs/models/text-models)", + "content": "πŸš€ New: LFM2.5-1.2B-Instruct and LFM2.5-1.2B-Thinking are now available! [Learn more β†’](/lfm/models/text-models)", "dismissible": true }, "theme": "mint", @@ -36,7 +36,7 @@ "logo": { "light": "/logo/light.svg", "dark": "/logo/dark.svg", - "href": "/docs/getting-started/welcome" + "href": "/lfm/getting-started/welcome" }, "navbar": { "links": [ @@ -54,136 +54,161 @@ "navigation": { "tabs": [ { - "tab": "Documentation", + "tab": "LFM", "groups": [ { - "group": "Get Started", + "group": "Getting Started", "icon": "rocket", "pages": [ - "docs/getting-started/welcome", - "docs/getting-started/connect-ai-tools" + "lfm/getting-started/welcome", + "lfm/getting-started/connect-ai-tools" ] }, { "group": "Models", "icon": "brain", "pages": [ - "docs/models/complete-library", - "docs/models/text-models", - "docs/models/vision-models", - "docs/models/audio-models", - "docs/models/liquid-nanos" + "lfm/models/complete-library", + "lfm/models/text-models", + "lfm/models/vision-models", + "lfm/models/audio-models", + "lfm/models/liquid-nanos" ] }, { "group": "Key Concepts", "icon": "lightbulb", "pages": [ - "docs/key-concepts/chat-template", - "docs/key-concepts/text-generation-and-prompting", - "docs/key-concepts/tool-use" + "lfm/key-concepts/chat-template", + "lfm/key-concepts/text-generation-and-prompting", + "lfm/key-concepts/tool-use" ] }, { - "group": "Inference", - "icon": "play", + "group": "Help", + "icon": "book", "pages": [ - "docs/inference/transformers", - "docs/inference/llama-cpp", - "docs/inference/vllm", - "docs/inference/sglang", - "docs/inference/mlx", - "docs/inference/ollama", - "docs/inference/onnx", - { - "group": "Other Frameworks", - "icon": "server", - "pages": [ - "docs/inference/lm-studio", - "docs/inference/modal-deployment", - "docs/inference/baseten-deployment", - "docs/inference/fal-deployment" - ] - } + "lfm/help/faqs", + "lfm/help/troubleshooting", + "lfm/help/contributing" + ] + } + ] + }, + { + "tab": "Customization", + "groups": [ + { + "group": "Getting Started", + "icon": "rocket", + "pages": [ + "customization/getting-started/welcome", + "customization/getting-started/connect-ai-tools" ] }, { - "group": "Fine-tuning", - "icon": "sliders", + "group": "Tools", + "icon": "wrench", "pages": [ - "docs/fine-tuning/workbench", - "docs/fine-tuning/datasets", - "docs/fine-tuning/trl", - "docs/fine-tuning/unsloth" + "customization/tools/workbench" ] }, { - "group": "Help", - "icon": "book", + "group": "Finetuning Frameworks", + "icon": "sliders", "pages": [ - "docs/help/faqs", - "docs/help/troubleshooting", - "docs/help/contributing" + "customization/finetuning-frameworks/datasets", + "customization/finetuning-frameworks/trl", + "customization/finetuning-frameworks/unsloth" ] } ] }, { - "tab": "SDK Reference", + "tab": "Deployment", "groups": [ { - "group": "Get Started", + "group": "Getting Started", "icon": "rocket", "pages": [ - "leap/edge-sdk/overview", - "docs/getting-started/connect-ai-tools" + "deployment/getting-started/welcome", + "deployment/getting-started/connect-ai-tools" ] }, { - "group": "iOS", - "icon": "apple", + "group": "On-Device", + "icon": "mobile", "pages": [ - "leap/edge-sdk/ios/ios-quick-start-guide", - "leap/edge-sdk/ios/ai-agent-usage-guide", - "leap/edge-sdk/ios/model-loading", - "leap/edge-sdk/ios/conversation-generation", - "leap/edge-sdk/ios/messages-content", - "leap/edge-sdk/ios/advanced-features", - "leap/edge-sdk/ios/utilities", - "leap/edge-sdk/ios/cloud-ai-comparison", - "leap/edge-sdk/ios/constrained-generation", - "leap/edge-sdk/ios/function-calling" + { + "group": "iOS SDK", + "icon": "apple", + "pages": [ + "deployment/on-device/ios/ios-quick-start-guide", + "deployment/on-device/ios/ai-agent-usage-guide", + "deployment/on-device/ios/model-loading", + "deployment/on-device/ios/conversation-generation", + "deployment/on-device/ios/messages-content", + "deployment/on-device/ios/advanced-features", + "deployment/on-device/ios/utilities", + "deployment/on-device/ios/cloud-ai-comparison", + "deployment/on-device/ios/constrained-generation", + "deployment/on-device/ios/function-calling" + ] + }, + { + "group": "Android SDK", + "icon": "robot", + "pages": [ + "deployment/on-device/android/android-quick-start-guide", + "deployment/on-device/android/ai-agent-usage-guide", + "deployment/on-device/android/model-loading", + "deployment/on-device/android/conversation-generation", + "deployment/on-device/android/messages-content", + "deployment/on-device/android/advanced-features", + "deployment/on-device/android/utilities", + "deployment/on-device/android/cloud-ai-comparison", + "deployment/on-device/android/constrained-generation", + "deployment/on-device/android/function-calling" + ] + }, + "deployment/on-device/llama-cpp", + "deployment/on-device/lm-studio", + "deployment/on-device/mlx", + "deployment/on-device/onnx", + "deployment/on-device/ollama" ] }, { - "group": "Android", - "icon": "robot", + "group": "GPU Inference", + "icon": "microchip", "pages": [ - "leap/edge-sdk/android/android-quick-start-guide", - "leap/edge-sdk/android/ai-agent-usage-guide", - "leap/edge-sdk/android/model-loading", - "leap/edge-sdk/android/conversation-generation", - "leap/edge-sdk/android/messages-content", - "leap/edge-sdk/android/advanced-features", - "leap/edge-sdk/android/utilities", - "leap/edge-sdk/android/cloud-ai-comparison", - "leap/edge-sdk/android/constrained-generation", - "leap/edge-sdk/android/function-calling" + "deployment/gpu-inference/transformers", + "deployment/gpu-inference/vllm", + "deployment/gpu-inference/sglang", + "deployment/gpu-inference/modal", + "deployment/gpu-inference/baseten", + "deployment/gpu-inference/fal" ] }, { - "group": "Model Bundling Service", - "icon": "box", + "group": "Tools", + "icon": "toolbox", "pages": [ - "leap/leap-bundle/quick-start", - "leap/leap-bundle/authentication", - "leap/leap-bundle/configuration", - "leap/leap-bundle/bundle-creation", - "leap/leap-bundle/bundle-management", - "leap/leap-bundle/download", - "leap/leap-bundle/reference", - "leap/leap-bundle/data-privacy", - "leap/leap-bundle/changelog" + { + "group": "Model Bundling Services", + "icon": "box", + "pages": [ + "deployment/tools/model-bundling/quick-start", + "deployment/tools/model-bundling/authentication", + "deployment/tools/model-bundling/configuration", + "deployment/tools/model-bundling/bundle-creation", + "deployment/tools/model-bundling/bundle-management", + "deployment/tools/model-bundling/download", + "deployment/tools/model-bundling/reference", + "deployment/tools/model-bundling/data-privacy", + "deployment/tools/model-bundling/changelog" + ] + } ] } ] @@ -192,11 +217,11 @@ "tab": "Examples", "groups": [ { - "group": "Get Started", + "group": "Getting Started", "icon": "rocket", "pages": [ "examples/index", - "docs/getting-started/connect-ai-tools" + "examples/connect-ai-tools" ] }, { @@ -244,8 +269,100 @@ }, "redirects": [ { - "source": "/lfm/:slug*", - "destination": "/docs/:slug*" + "source": "/docs/getting-started/welcome", + "destination": "/lfm/getting-started/welcome" + }, + { + "source": "/docs/getting-started/connect-ai-tools", + "destination": "/lfm/getting-started/connect-ai-tools" + }, + { + "source": "/docs/models/:slug*", + "destination": "/lfm/models/:slug*" + }, + { + "source": "/docs/key-concepts/:slug*", + "destination": "/lfm/key-concepts/:slug*" + }, + { + "source": "/docs/help/:slug*", + "destination": "/lfm/help/:slug*" + }, + { + "source": "/docs/fine-tuning/workbench", + "destination": "/customization/tools/workbench" + }, + { + "source": "/docs/fine-tuning/datasets", + "destination": "/customization/finetuning-frameworks/datasets" + }, + { + "source": "/docs/fine-tuning/trl", + "destination": "/customization/finetuning-frameworks/trl" + }, + { + "source": "/docs/fine-tuning/unsloth", + "destination": "/customization/finetuning-frameworks/unsloth" + }, + { + "source": "/docs/inference/llama-cpp", + "destination": "/deployment/on-device/llama-cpp" + }, + { + "source": "/docs/inference/mlx", + "destination": "/deployment/on-device/mlx" + }, + { + "source": "/docs/inference/onnx", + "destination": "/deployment/on-device/onnx" + }, + { + "source": "/docs/inference/ollama", + "destination": "/deployment/on-device/ollama" + }, + { + "source": "/docs/inference/lm-studio", + "destination": "/deployment/on-device/lm-studio" + }, + { + "source": "/docs/inference/transformers", + "destination": "/deployment/gpu-inference/transformers" + }, + { + "source": "/docs/inference/vllm", + "destination": "/deployment/gpu-inference/vllm" + }, + { + "source": "/docs/inference/sglang", + "destination": "/deployment/gpu-inference/sglang" + }, + { + "source": "/docs/inference/modal-deployment", + "destination": "/deployment/gpu-inference/modal" + }, + { + "source": "/docs/inference/baseten-deployment", + "destination": "/deployment/gpu-inference/baseten" + }, + { + "source": "/docs/inference/fal-deployment", + "destination": "/deployment/gpu-inference/fal" + }, + { + "source": "/leap/edge-sdk/overview", + "destination": "/deployment/on-device/ios/ios-quick-start-guide" + }, + { + "source": "/leap/edge-sdk/ios/:slug*", + "destination": "/deployment/on-device/ios/:slug*" + }, + { + "source": "/leap/edge-sdk/android/:slug*", + "destination": "/deployment/on-device/android/:slug*" + }, + { + "source": "/leap/leap-bundle/:slug*", + "destination": "/deployment/tools/model-bundling/:slug*" } ], "ai": { diff --git a/docs/models/complete-library.mdx b/docs/models/complete-library.mdx deleted file mode 100644 index a78bd7e..0000000 --- a/docs/models/complete-library.mdx +++ /dev/null @@ -1,94 +0,0 @@ ---- -title: "Model Library" -description: "Liquid Foundation Models (LFMs) are a new class of multimodal architectures built for fast inference and on-device deployment. Browse all available models and formats here." ---- - -
- -All of our models share the following capabilities: - -- 32k token context length for extended conversations and document processing -- Designed for fast inference with [Transformers](/docs/inference/transformers), [llama.cpp](/docs/inference/llama-cpp), [vLLM](/docs/inference/vllm), [MLX](/docs/inference/mlx), [Ollama](/docs/inference/ollama), and [LEAP](/docs/frameworks/leap) -- Trainable via SFT, DPO, and GRPO with [TRL](/docs/fine-tuning/trl) and [Unsloth](/docs/fine-tuning/unsloth) - -
- -## Model Families - -Choose a model based on your desired functionalities. Each individual model card has specific details on deployment and customization. - - - - - Chat, tool calling, structured output, and classification. - - - - Image understanding with LFM backbones and custom encoders. - - - - Interleaved audio/text models for TTS, ASR, and voice chat. - - - - Task-specific models for extraction, summarization, RAG, and translation. - - - - -## Model Formats - -All LFM2 models are available in multiple formats for flexible deployment: - -- **GGUF** β€” Best for local CPU/GPU inference on any platform. Use with [llama.cpp](/docs/inference/llama-cpp), [LM Studio](/docs/inference/lm-studio), or [Ollama](/docs/inference/ollama). Append `-GGUF` to any model name. -- **MLX** β€” Best for Mac users with Apple Silicon. Leverages unified memory for fast inference via [MLX](/docs/inference/mlx). Browse at [mlx-community](https://huggingface.co/mlx-community/collections?search=LFM). -- **ONNX** β€” Best for production deployments and edge devices. Cross-platform with ONNX Runtime across CPUs, GPUs, and accelerators. Append `-ONNX` to any model name. - -### Quantization - -Quantization reduces model size and speeds up inference with minimal quality loss. Available options by format: - -- **GGUF** β€” Supports `Q4_0`, `Q4_K_M`, `Q5_K_M`, `Q6_K`, `Q8_0`, `BF16`, and `F16`. `Q4_K_M` offers the best balance of size and quality. -- **MLX** β€” Available in `3bit`, `4bit`, `5bit`, `6bit`, `8bit`, and `BF16`. `8bit` is recommended. -- **ONNX** β€” Supports `FP32`, `FP16`, `Q4`, and `Q8` (MoE models also support `Q4F16`). `Q4` is recommended for most deployments. - -## Model Chart - -| Model | HF | GGUF | MLX | ONNX | Trainable? | -| ----- | -- | ---- | --- | ---- | ---------- | -| **Text-to-text Models** | | | | | | -| LFM2.5 Models (Latest Release) | | | | | | -| [LFM2.5-1.2B-Instruct](/docs/models/lfm25-1.2b-instruct) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-MLX-8bit) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-ONNX) | Yes (TRL) | -| [LFM2.5-1.2B-Thinking](/docs/models/lfm25-1.2b-thinking) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-GGUF) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-MLX-8bit) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-ONNX) | Yes (TRL) | -| [LFM2.5-1.2B-Base](/docs/models/lfm25-1.2b-base) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base-GGUF) | βœ— | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base-ONNX) | Yes (TRL) | -| [LFM2.5-1.2B-JP](/docs/models/lfm25-1.2b-jp) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-GGUF) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-MLX-8bit) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-ONNX) | Yes (TRL) | -| LFM2 Models | | | | | | -| [LFM2-8B-A1B](/docs/models/lfm2-8b-a1b) | [βœ“](https://huggingface.co/LiquidAI/LFM2-8B-A1B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-8B-A1B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-8B-A1B-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-8B-A1B-ONNX) | Yes (TRL) | -| [LFM2-2.6B](/docs/models/lfm2-2.6b) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-2.6B-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-2.6B-ONNX) | Yes (TRL) | -| [LFM2-2.6B-Exp](/docs/models/lfm2-2.6b-exp) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B-Exp) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B-Exp-GGUF) | βœ— | βœ— | Yes (TRL) | -| [LFM2-1.2B](/docs/models/lfm2-1.2b) Deprecated | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-1.2B-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-1.2B-ONNX) | Yes (TRL) | -| [LFM2-700M](/docs/models/lfm2-700m) | [βœ“](https://huggingface.co/LiquidAI/LFM2-700M) | [βœ“](https://huggingface.co/LiquidAI/LFM2-700M-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-700M-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-700M-ONNX) | Yes (TRL) | -| [LFM2-350M](/docs/models/lfm2-350m) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-350M-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-350M-ONNX) | Yes (TRL) | -| **Vision Language Models** | | | | | | -| LFM2.5 Models (Latest Release) | | | | | | -| [LFM2.5-VL-1.6B](/docs/models/lfm25-vl-1.6b) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2.5-VL-1.6B-8bit) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-ONNX) | Yes (TRL) | -| LFM2 Models | | | | | | -| [LFM2-VL-3B](/docs/models/lfm2-vl-3b) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-3B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-3B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-VL-3B-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-VL-3B-ONNX) | Yes (TRL) | -| [LFM2-VL-1.6B](/docs/models/lfm2-vl-1.6b) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-1.6B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-1.6B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-VL-1.6B-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-VL-1.6B-ONNX) | Yes (TRL) | -| [LFM2-VL-450M](/docs/models/lfm2-vl-450m) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-450M) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-450M-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-VL-450M-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-VL-450M-ONNX) | Yes (TRL) | -| **Audio Models** | | | | | | -| LFM2.5 Models (Latest Release) | | | | | | -| [LFM2.5-Audio-1.5B](/docs/models/lfm25-audio-1.5b) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-GGUF) | βœ— | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-ONNX) | Yes (TRL) | -| LFM2 Models | | | | | | -| [LFM2-Audio-1.5B](/docs/models/lfm2-audio-1.5b) | [βœ“](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B-GGUF) | βœ— | βœ— | No | -| **Liquid Nanos** | | | | | | -| [LFM2-1.2B-Extract](/docs/models/lfm2-1.2b-extract) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-Extract) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-Extract-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-1.2B-Extract-ONNX) | Yes (TRL) | -| [LFM2-350M-Extract](/docs/models/lfm2-350m-extract) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-Extract) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-Extract-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-350M-Extract-ONNX) | Yes (TRL) | -| [LFM2-350M-ENJP-MT](/docs/models/lfm2-350m-enjp-mt) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-350M-ENJP-MT-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-350M-ENJP-MT-ONNX) | Yes (TRL) | -| [LFM2-1.2B-RAG](/docs/models/lfm2-1.2b-rag) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-RAG) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-RAG-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-1.2B-RAG-ONNX) | Yes (TRL) | -| [LFM2-1.2B-Tool](/docs/models/lfm2-1.2b-tool) Deprecated | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-Tool) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-Tool-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-1.2B-Tool-ONNX) | Yes (TRL) | -| [LFM2-350M-Math](/docs/models/lfm2-350m-math) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-Math) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-Math-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-350M-Math-ONNX) | Yes (TRL) | -| [LFM2-350M-PII-Extract-JP](/docs/models/lfm2-350m-pii-extract-jp) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-PII-Extract-JP) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-PII-Extract-JP-GGUF) | βœ— | βœ— | Yes (TRL) | -| [LFM2-ColBERT-350M](/docs/models/lfm2-colbert-350m) | [βœ“](https://huggingface.co/LiquidAI/LFM2-ColBERT-350M) | βœ— | βœ— | βœ— | Yes (PyLate) | -| [LFM2-2.6B-Transcript](/docs/models/lfm2-2.6b-transcript) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B-Transcript) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B-Transcript-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-2.6B-Transcript-ONNX) | Yes (TRL) | diff --git a/examples/connect-ai-tools.mdx b/examples/connect-ai-tools.mdx new file mode 100644 index 0000000..69ee17b --- /dev/null +++ b/examples/connect-ai-tools.mdx @@ -0,0 +1,8 @@ +--- +title: "Connect AI Tools" +description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation" +--- + +import ConnectAiTools from "/snippets/connect-ai-tools.mdx"; + + diff --git a/examples/laptop-examples/audio-to-text-in-real-time.mdx b/examples/laptop-examples/audio-to-text-in-real-time.mdx index 40d5ece..3fc5000 100644 --- a/examples/laptop-examples/audio-to-text-in-real-time.mdx +++ b/examples/laptop-examples/audio-to-text-in-real-time.mdx @@ -6,7 +6,7 @@ title: "Audio transcription in real-time" Browse the complete example on GitHub -This example demonstrates how to use the [LFM2-Audio-1.5B](https://docs.liquid.ai/docs/models/lfm2-audio-1.5b) model with llama.cpp to transcribe audio files locally in real-time. +This example demonstrates how to use the [LFM2-Audio-1.5B](https://docs.liquid.ai/lfm/models/lfm2-audio-1.5b) model with llama.cpp to transcribe audio files locally in real-time. Intelligent audio assistants on the edge are possible, and this repository is just one step towards that. @@ -120,7 +120,7 @@ For example, we can use ### What is LFM2-350M? -[LFM2-350M](https://docs.liquid.ai/docs/models/lfm2-350m) is a small text-to-text model that can be used for tasks like text cleaning. To achieve optimal performance for your particular use case, you need to optimize your system and user prompts. +[LFM2-350M](https://docs.liquid.ai/lfm/models/lfm2-350m) is a small text-to-text model that can be used for tasks like text cleaning. To achieve optimal performance for your particular use case, you need to optimize your system and user prompts. Use our no-code tool to optimize your system and user prompts, and get your model ready for deployment. diff --git a/examples/laptop-examples/flight-search-assistant.mdx b/examples/laptop-examples/flight-search-assistant.mdx index a7d6ba4..bef9128 100644 --- a/examples/laptop-examples/flight-search-assistant.mdx +++ b/examples/laptop-examples/flight-search-assistant.mdx @@ -6,7 +6,7 @@ title: "Flight search assistant with tool calling" Browse the complete example on GitHub -This project demonstrates a Python CLI that leverages the [LFM2.5-1.2B-Thinking](/docs/models/lfm25-1.2b-thinking) model to help users find and book plane tickets through multi-step reasoning and tool calling. +This project demonstrates a Python CLI that leverages the [LFM2.5-1.2B-Thinking](/lfm/models/lfm25-1.2b-thinking) model to help users find and book plane tickets through multi-step reasoning and tool calling. ![Flight Search Assistant Demo](https://raw.githubusercontent.com/Liquid4All/cookbook/main/examples/flight-search-assistant/media/demo.gif) diff --git a/examples/laptop-examples/invoice-extractor-tool-with-liquid-nanos.mdx b/examples/laptop-examples/invoice-extractor-tool-with-liquid-nanos.mdx index 1a3efc1..b8c345c 100644 --- a/examples/laptop-examples/invoice-extractor-tool-with-liquid-nanos.mdx +++ b/examples/laptop-examples/invoice-extractor-tool-with-liquid-nanos.mdx @@ -22,7 +22,7 @@ In this example, you will learn how to: * **Set up local AI inference** using llama.cpp to run Liquid models entirely on your machine without requiring cloud services or API keys * **Build a file monitoring system** that automatically processes new files dropped into a directory -* **Extract structured output from images** using [LFM2.5-VL-1.6B](https://docs.liquid.ai/docs/models/lfm25-vl-1.6b), a small vision-language model. +* **Extract structured output from images** using [LFM2.5-VL-1.6B](https://docs.liquid.ai/lfm/models/lfm25-vl-1.6b), a small vision-language model. ## Environment setup diff --git a/examples/web/vl-webgpu-demo.mdx b/examples/web/vl-webgpu-demo.mdx index a80dfed..a9c2671 100644 --- a/examples/web/vl-webgpu-demo.mdx +++ b/examples/web/vl-webgpu-demo.mdx @@ -6,7 +6,7 @@ title: "Real-time video captioning with LFM2.5-VL-1.6B and WebGPU" Browse the complete example on GitHub -This example demonstrates how to run a vision-language model directly in your web browser using WebGPU acceleration. The demo showcases real-time video captioning with the [LFM2.5-VL-1.6B](/docs/models/lfm25-vl-1.6b) model, eliminating the need for cloud-based inference services. +This example demonstrates how to run a vision-language model directly in your web browser using WebGPU acceleration. The demo showcases real-time video captioning with the [LFM2.5-VL-1.6B](/lfm/models/lfm25-vl-1.6b) model, eliminating the need for cloud-based inference services. ## Key Features @@ -43,7 +43,7 @@ This example demonstrates how to run a vision-language model directly in your we ## Understanding the Architecture -This demo uses the **[LFM2.5-VL-1.6B](/docs/models/lfm25-vl-1.6b)** model, a 1.6 billion parameter vision-language model that has been quantized for efficient browser-based inference. The model runs entirely client-side using ONNX Runtime Web with WebGPU acceleration. +This demo uses the **[LFM2.5-VL-1.6B](/lfm/models/lfm25-vl-1.6b)** model, a 1.6 billion parameter vision-language model that has been quantized for efficient browser-based inference. The model runs entirely client-side using ONNX Runtime Web with WebGPU acceleration. ### Remote vs. Local Inference @@ -57,7 +57,7 @@ With WebGPU and local inference, everything runs directly in your browser: ### Technical Stack -- **Model**: [LFM2.5-VL-1.6B](/docs/models/lfm25-vl-1.6b) (quantized ONNX format) +- **Model**: [LFM2.5-VL-1.6B](/lfm/models/lfm25-vl-1.6b) (quantized ONNX format) - **Inference Engine**: ONNX Runtime Web with WebGPU backend - **Build Tool**: Vite for fast development and optimized production builds - **Browser Requirements**: WebGPU-compatible browser (Chrome, Edge) diff --git a/leap/edge-sdk/overview.mdx b/leap/edge-sdk/overview.mdx index 9e4d1ca..fa91a45 100644 --- a/leap/edge-sdk/overview.mdx +++ b/leap/edge-sdk/overview.mdx @@ -36,4 +36,4 @@ The current list of main features includes: We are consistently adding to this list - see our [changelog](/leap/changelog) for detailed updates. -[Edit this page](https://github.com/Liquid4All/docs/tree/main/leap/edge-sdk/overview.mdx) +[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/ios/ios-quick-start-guide.mdx) diff --git a/lfm/getting-started/connect-ai-tools.mdx b/lfm/getting-started/connect-ai-tools.mdx new file mode 100644 index 0000000..69ee17b --- /dev/null +++ b/lfm/getting-started/connect-ai-tools.mdx @@ -0,0 +1,8 @@ +--- +title: "Connect AI Tools" +description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation" +--- + +import ConnectAiTools from "/snippets/connect-ai-tools.mdx"; + + diff --git a/docs/getting-started/welcome.mdx b/lfm/getting-started/welcome.mdx similarity index 84% rename from docs/getting-started/welcome.mdx rename to lfm/getting-started/welcome.mdx index 09d08ac..2e40ad5 100644 --- a/docs/getting-started/welcome.mdx +++ b/lfm/getting-started/welcome.mdx @@ -42,16 +42,19 @@ Built on a new hybrid architecture, LFM2 sets a new standard in quality, speed, ## Get Started - - + + Browse our collection of language models and their capabilities - + Learn how to run models for different use cases and platforms - + Customize models for your specific requirements + + End-to-end examples for mobile, laptop, and web + diff --git a/docs/help/contributing.mdx b/lfm/help/contributing.mdx similarity index 92% rename from docs/help/contributing.mdx rename to lfm/help/contributing.mdx index 0f84180..d545dc5 100644 --- a/docs/help/contributing.mdx +++ b/lfm/help/contributing.mdx @@ -102,8 +102,8 @@ Use Mintlify components appropriately: ### Links -- Use relative links for internal pages: `/docs/inference/transformers` -- Use descriptive link text: "See the [inference guide](/docs/inference/transformers)" not "Click [here](/docs/inference/transformers)" +- Use relative links for internal pages: `/deployment/gpu-inference/transformers` +- Use descriptive link text: "See the [inference guide](/deployment/gpu-inference/transformers)" not "Click [here](/deployment/gpu-inference/transformers)" ## What to Contribute diff --git a/docs/help/faqs.mdx b/lfm/help/faqs.mdx similarity index 72% rename from docs/help/faqs.mdx rename to lfm/help/faqs.mdx index fa81927..d455f22 100644 --- a/docs/help/faqs.mdx +++ b/lfm/help/faqs.mdx @@ -15,12 +15,12 @@ All LFM models support a 32k token text context length for extended conversation LFM models are compatible with: -- [Transformers](/docs/inference/transformers) - For research and development -- [llama.cpp](/docs/inference/llama-cpp) - For efficient CPU inference -- [vLLM](/docs/inference/vllm) - For high-throughput production serving -- [MLX](/docs/inference/mlx) - For Apple Silicon optimization -- [Ollama](/docs/inference/ollama) - For easy local deployment -- [LEAP](/leap/edge-sdk/overview) - For edge and mobile deployment +- [Transformers](/deployment/gpu-inference/transformers) - For research and development +- [llama.cpp](/deployment/on-device/llama-cpp) - For efficient CPU inference +- [vLLM](/deployment/gpu-inference/vllm) - For high-throughput production serving +- [MLX](/deployment/on-device/mlx) - For Apple Silicon optimization +- [Ollama](/deployment/on-device/ollama) - For easy local deployment +- [LEAP](/deployment/on-device/ios/ios-quick-start-guide) - For edge and mobile deployment ## Model Selection @@ -39,7 +39,7 @@ LFM2.5 models are updated versions with improved training that deliver higher pe -[Liquid Nanos](/docs/models/liquid-nanos) are task-specific models fine-tuned for specialized use cases like: +[Liquid Nanos](/lfm/models/liquid-nanos) are task-specific models fine-tuned for specialized use cases like: - Information extraction (LFM2-Extract) - Translation (LFM2-350M-ENJP-MT) - RAG question answering (LFM2-1.2B-RAG) @@ -49,7 +49,7 @@ LFM2.5 models are updated versions with improved training that deliver higher pe ## Deployment -Yes! Use the [LEAP SDK](/leap/edge-sdk/overview) to deploy models on iOS and Android devices. LEAP provides optimized inference for edge deployment with support for quantized models. +Yes! Use the [LEAP SDK](/deployment/on-device/ios/ios-quick-start-guide) to deploy models on iOS and Android devices. LEAP provides optimized inference for edge deployment with support for quantized models. @@ -69,7 +69,7 @@ For most use cases, Q4_K_M or Q5_K_M provide good quality with significant size ## Fine-tuning -Yes! Most LFM models support fine-tuning with [TRL](/docs/fine-tuning/trl) and [Unsloth](/docs/fine-tuning/unsloth). Check the [Model Library](/docs/models/complete-library) for trainability information. +Yes! Most LFM models support fine-tuning with [TRL](/customization/finetuning-frameworks/trl) and [Unsloth](/customization/finetuning-frameworks/unsloth). Check the [Model Library](/lfm/models/complete-library) for trainability information. @@ -82,4 +82,4 @@ Yes! Most LFM models support fine-tuning with [TRL](/docs/fine-tuning/trl) and [ - Join our [Discord community](https://discord.gg/DFU3WQeaYD) for real-time help - Check the [Cookbook](https://github.com/Liquid4All/cookbook) for examples -- See [Troubleshooting](/docs/help/troubleshooting) for common issues +- See [Troubleshooting](/lfm/help/troubleshooting) for common issues diff --git a/docs/help/troubleshooting.mdx b/lfm/help/troubleshooting.mdx similarity index 100% rename from docs/help/troubleshooting.mdx rename to lfm/help/troubleshooting.mdx diff --git a/docs/key-concepts/chat-template.mdx b/lfm/key-concepts/chat-template.mdx similarity index 98% rename from docs/key-concepts/chat-template.mdx rename to lfm/key-concepts/chat-template.mdx index db15e2f..2f8ea9c 100644 --- a/docs/key-concepts/chat-template.mdx +++ b/lfm/key-concepts/chat-template.mdx @@ -23,7 +23,7 @@ LFM2 supports four conversation roles: * **`system`** β€” (Optional) Defines who the assistant is and how it should respond. * **`user`** β€” Messages from the user containing questions and instructions. * **`assistant`** β€” Responses from the model. -* **`tool`** β€” Results from tool/function execution. Used for [tool use](/docs/key-concepts/tool-use) workflows. +* **`tool`** β€” Results from tool/function execution. Used for [tool use](/lfm/key-concepts/tool-use) workflows. The complete chat template definition can be found in the `chat_template.jinja` file in each model's Hugging Face repository. diff --git a/docs/key-concepts/text-generation-and-prompting.mdx b/lfm/key-concepts/text-generation-and-prompting.mdx similarity index 95% rename from docs/key-concepts/text-generation-and-prompting.mdx rename to lfm/key-concepts/text-generation-and-prompting.mdx index 35b5852..7fc71f8 100644 --- a/docs/key-concepts/text-generation-and-prompting.mdx +++ b/lfm/key-concepts/text-generation-and-prompting.mdx @@ -73,7 +73,7 @@ Control text generation behavior, balancing creativity, determinism, and quality * **`repetition_penalty`** (1.0+) - Reduces repetition. 1.0 = no penalty; >1.0 = prevents repetition. * **`max_tokens`** / **`max_new_tokens`** - Maximum tokens to generate. -Parameter names and syntax vary by platform. See [Transformers](/docs/inference/transformers), [vLLM](/docs/inference/vllm), or [llama.cpp](/docs/inference/llama-cpp) for details. +Parameter names and syntax vary by platform. See [Transformers](/deployment/gpu-inference/transformers), [vLLM](/deployment/gpu-inference/vllm), or [llama.cpp](/deployment/on-device/llama-cpp) for details. ### Recommended Settings Text @@ -132,5 +132,5 @@ min_image_tokens = 32 * `do_image_splitting=True` -**Liquid Nanos** (task-specific models like LFM2-Extract, LFM2-RAG, LFM2-Tool, etc.) may have special prompting requirements and different generation parameters. For the best usage guidelines, refer to the individual model cards on the [Liquid Nanos](/docs/models/liquid-nanos) page. +**Liquid Nanos** (task-specific models like LFM2-Extract, LFM2-RAG, LFM2-Tool, etc.) may have special prompting requirements and different generation parameters. For the best usage guidelines, refer to the individual model cards on the [Liquid Nanos](/lfm/models/liquid-nanos) page. diff --git a/docs/key-concepts/tool-use.mdx b/lfm/key-concepts/tool-use.mdx similarity index 100% rename from docs/key-concepts/tool-use.mdx rename to lfm/key-concepts/tool-use.mdx diff --git a/docs/models/audio-models.mdx b/lfm/models/audio-models.mdx similarity index 94% rename from docs/models/audio-models.mdx rename to lfm/models/audio-models.mdx index a81c50e..1434969 100644 --- a/docs/models/audio-models.mdx +++ b/lfm/models/audio-models.mdx @@ -32,7 +32,7 @@ icon: "headphones" - + 1.5B Β· Recommended Best audio model for most use cases. Fast, accurate, and CPU-friendly. @@ -44,7 +44,7 @@ icon: "headphones" - + 1.5B Β· Deprecated Use the new LFM2.5-Audio-1.5B checkpoint instead. diff --git a/lfm/models/complete-library.mdx b/lfm/models/complete-library.mdx new file mode 100644 index 0000000..1d7a6f8 --- /dev/null +++ b/lfm/models/complete-library.mdx @@ -0,0 +1,94 @@ +--- +title: "Model Library" +description: "Liquid Foundation Models (LFMs) are a new class of multimodal architectures built for fast inference and on-device deployment. Browse all available models and formats here." +--- + +
+ +All of our models share the following capabilities: + +- 32k token context length for extended conversations and document processing +- Designed for fast inference with [Transformers](/deployment/gpu-inference/transformers), [llama.cpp](/deployment/on-device/llama-cpp), [vLLM](/deployment/gpu-inference/vllm), [MLX](/deployment/on-device/mlx), [Ollama](/deployment/on-device/ollama), and [LEAP](/deployment/on-device/ios/ios-quick-start-guide) +- Trainable via SFT, DPO, and GRPO with [TRL](/customization/finetuning-frameworks/trl) and [Unsloth](/customization/finetuning-frameworks/unsloth) + +
+ +## Model Families + +Choose a model based on your desired functionalities. Each individual model card has specific details on deployment and customization. + + + + + Chat, tool calling, structured output, and classification. + + + + Image understanding with LFM backbones and custom encoders. + + + + Interleaved audio/text models for TTS, ASR, and voice chat. + + + + Task-specific models for extraction, summarization, RAG, and translation. + + + + +## Model Formats + +All LFM2 models are available in multiple formats for flexible deployment: + +- **GGUF** β€” Best for local CPU/GPU inference on any platform. Use with [llama.cpp](/deployment/on-device/llama-cpp), [LM Studio](/deployment/on-device/lm-studio), or [Ollama](/deployment/on-device/ollama). Append `-GGUF` to any model name. +- **MLX** β€” Best for Mac users with Apple Silicon. Leverages unified memory for fast inference via [MLX](/deployment/on-device/mlx). Browse at [mlx-community](https://huggingface.co/mlx-community/collections?search=LFM). +- **ONNX** β€” Best for production deployments and edge devices. Cross-platform with ONNX Runtime across CPUs, GPUs, and accelerators. Append `-ONNX` to any model name. + +### Quantization + +Quantization reduces model size and speeds up inference with minimal quality loss. Available options by format: + +- **GGUF** β€” Supports `Q4_0`, `Q4_K_M`, `Q5_K_M`, `Q6_K`, `Q8_0`, `BF16`, and `F16`. `Q4_K_M` offers the best balance of size and quality. +- **MLX** β€” Available in `3bit`, `4bit`, `5bit`, `6bit`, `8bit`, and `BF16`. `8bit` is recommended. +- **ONNX** β€” Supports `FP32`, `FP16`, `Q4`, and `Q8` (MoE models also support `Q4F16`). `Q4` is recommended for most deployments. + +## Model Chart + +| Model | HF | GGUF | MLX | ONNX | Trainable? | +| ----- | -- | ---- | --- | ---- | ---------- | +| **Text-to-text Models** | | | | | | +| LFM2.5 Models (Latest Release) | | | | | | +| [LFM2.5-1.2B-Instruct](/lfm/models/lfm25-1.2b-instruct) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-MLX-8bit) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-ONNX) | Yes (TRL) | +| [LFM2.5-1.2B-Thinking](/lfm/models/lfm25-1.2b-thinking) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-GGUF) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-MLX-8bit) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-ONNX) | Yes (TRL) | +| [LFM2.5-1.2B-Base](/lfm/models/lfm25-1.2b-base) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base-GGUF) | βœ— | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Base-ONNX) | Yes (TRL) | +| [LFM2.5-1.2B-JP](/lfm/models/lfm25-1.2b-jp) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-GGUF) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-MLX-8bit) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-ONNX) | Yes (TRL) | +| LFM2 Models | | | | | | +| [LFM2-8B-A1B](/lfm/models/lfm2-8b-a1b) | [βœ“](https://huggingface.co/LiquidAI/LFM2-8B-A1B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-8B-A1B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-8B-A1B-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-8B-A1B-ONNX) | Yes (TRL) | +| [LFM2-2.6B](/lfm/models/lfm2-2.6b) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-2.6B-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-2.6B-ONNX) | Yes (TRL) | +| [LFM2-2.6B-Exp](/lfm/models/lfm2-2.6b-exp) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B-Exp) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B-Exp-GGUF) | βœ— | βœ— | Yes (TRL) | +| [LFM2-1.2B](/lfm/models/lfm2-1.2b) Deprecated | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-1.2B-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-1.2B-ONNX) | Yes (TRL) | +| [LFM2-700M](/lfm/models/lfm2-700m) | [βœ“](https://huggingface.co/LiquidAI/LFM2-700M) | [βœ“](https://huggingface.co/LiquidAI/LFM2-700M-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-700M-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-700M-ONNX) | Yes (TRL) | +| [LFM2-350M](/lfm/models/lfm2-350m) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-350M-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-350M-ONNX) | Yes (TRL) | +| **Vision Language Models** | | | | | | +| LFM2.5 Models (Latest Release) | | | | | | +| [LFM2.5-VL-1.6B](/lfm/models/lfm25-vl-1.6b) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2.5-VL-1.6B-8bit) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-ONNX) | Yes (TRL) | +| LFM2 Models | | | | | | +| [LFM2-VL-3B](/lfm/models/lfm2-vl-3b) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-3B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-3B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-VL-3B-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-VL-3B-ONNX) | Yes (TRL) | +| [LFM2-VL-1.6B](/lfm/models/lfm2-vl-1.6b) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-1.6B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-1.6B-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-VL-1.6B-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-VL-1.6B-ONNX) | Yes (TRL) | +| [LFM2-VL-450M](/lfm/models/lfm2-vl-450m) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-450M) | [βœ“](https://huggingface.co/LiquidAI/LFM2-VL-450M-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-VL-450M-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-VL-450M-ONNX) | Yes (TRL) | +| **Audio Models** | | | | | | +| LFM2.5 Models (Latest Release) | | | | | | +| [LFM2.5-Audio-1.5B](/lfm/models/lfm25-audio-1.5b) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-GGUF) | βœ— | [βœ“](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-ONNX) | Yes (TRL) | +| LFM2 Models | | | | | | +| [LFM2-Audio-1.5B](/lfm/models/lfm2-audio-1.5b) | [βœ“](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B) | [βœ“](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B-GGUF) | βœ— | βœ— | No | +| **Liquid Nanos** | | | | | | +| [LFM2-1.2B-Extract](/lfm/models/lfm2-1.2b-extract) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-Extract) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-Extract-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-1.2B-Extract-ONNX) | Yes (TRL) | +| [LFM2-350M-Extract](/lfm/models/lfm2-350m-extract) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-Extract) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-Extract-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-350M-Extract-ONNX) | Yes (TRL) | +| [LFM2-350M-ENJP-MT](/lfm/models/lfm2-350m-enjp-mt) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT-GGUF) | [βœ“](https://huggingface.co/mlx-community/LFM2-350M-ENJP-MT-8bit) | [βœ“](https://huggingface.co/onnx-community/LFM2-350M-ENJP-MT-ONNX) | Yes (TRL) | +| [LFM2-1.2B-RAG](/lfm/models/lfm2-1.2b-rag) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-RAG) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-RAG-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-1.2B-RAG-ONNX) | Yes (TRL) | +| [LFM2-1.2B-Tool](/lfm/models/lfm2-1.2b-tool) Deprecated | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-Tool) | [βœ“](https://huggingface.co/LiquidAI/LFM2-1.2B-Tool-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-1.2B-Tool-ONNX) | Yes (TRL) | +| [LFM2-350M-Math](/lfm/models/lfm2-350m-math) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-Math) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-Math-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-350M-Math-ONNX) | Yes (TRL) | +| [LFM2-350M-PII-Extract-JP](/lfm/models/lfm2-350m-pii-extract-jp) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-PII-Extract-JP) | [βœ“](https://huggingface.co/LiquidAI/LFM2-350M-PII-Extract-JP-GGUF) | βœ— | βœ— | Yes (TRL) | +| [LFM2-ColBERT-350M](/lfm/models/lfm2-colbert-350m) | [βœ“](https://huggingface.co/LiquidAI/LFM2-ColBERT-350M) | βœ— | βœ— | βœ— | Yes (PyLate) | +| [LFM2-2.6B-Transcript](/lfm/models/lfm2-2.6b-transcript) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B-Transcript) | [βœ“](https://huggingface.co/LiquidAI/LFM2-2.6B-Transcript-GGUF) | βœ— | [βœ“](https://huggingface.co/onnx-community/LFM2-2.6B-Transcript-ONNX) | Yes (TRL) | diff --git a/docs/models/lfm2-1.2b-extract.mdx b/lfm/models/lfm2-1.2b-extract.mdx similarity index 97% rename from docs/models/lfm2-1.2b-extract.mdx rename to lfm/models/lfm2-1.2b-extract.mdx index 2995713..eccaa68 100644 --- a/docs/models/lfm2-1.2b-extract.mdx +++ b/lfm/models/lfm2-1.2b-extract.mdx @@ -3,7 +3,7 @@ title: "LFM2-1.2B-Extract" description: "1.2B parameter model for structured information extraction from documents" --- -
← Back to Liquid Nanos +← Back to Liquid Nanos LFM2-1.2B-Extract is optimized for extracting structured data (JSON, XML, YAML) from unstructured documents. It handles complex nested schemas and multi-field extraction with high accuracy. diff --git a/docs/models/lfm2-1.2b-rag.mdx b/lfm/models/lfm2-1.2b-rag.mdx similarity index 97% rename from docs/models/lfm2-1.2b-rag.mdx rename to lfm/models/lfm2-1.2b-rag.mdx index a43d963..bf219d5 100644 --- a/docs/models/lfm2-1.2b-rag.mdx +++ b/lfm/models/lfm2-1.2b-rag.mdx @@ -3,7 +3,7 @@ title: "LFM2-1.2B-RAG" description: "1.2B parameter model optimized for Retrieval-Augmented Generation" --- -← Back to Liquid Nanos +← Back to Liquid Nanos LFM2-1.2B-RAG is optimized for answering questions grounded in provided context documents. It excels at extracting relevant information from retrieved documents while avoiding hallucination. diff --git a/docs/models/lfm2-1.2b-tool.mdx b/lfm/models/lfm2-1.2b-tool.mdx similarity index 82% rename from docs/models/lfm2-1.2b-tool.mdx rename to lfm/models/lfm2-1.2b-tool.mdx index 91e7a63..a62e330 100644 --- a/docs/models/lfm2-1.2b-tool.mdx +++ b/lfm/models/lfm2-1.2b-tool.mdx @@ -3,10 +3,10 @@ title: "LFM2-1.2B-Tool" description: "1.2B parameter model for tool calling (deprecated)" --- -← Back to Liquid Nanos +← Back to Liquid Nanos -This model is deprecated. Use [LFM2.5-1.2B-Instruct](/docs/models/lfm25-1.2b-instruct) for tool calling instead, which offers improved accuracy and follows the standard tool use format. +This model is deprecated. Use [LFM2.5-1.2B-Instruct](/lfm/models/lfm25-1.2b-instruct) for tool calling instead, which offers improved accuracy and follows the standard tool use format. LFM2-1.2B-Tool was optimized for efficient and precise tool calling. It has been superseded by LFM2.5-1.2B-Instruct which provides better tool calling performance alongside general chat capabilities. @@ -40,4 +40,4 @@ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") # See the Tool Use guide for complete examples ``` -See the [Tool Use](/docs/key-concepts/tool-use) guide for detailed tool calling documentation. +See the [Tool Use](/lfm/key-concepts/tool-use) guide for detailed tool calling documentation. diff --git a/docs/models/lfm2-1.2b.mdx b/lfm/models/lfm2-1.2b.mdx similarity index 91% rename from docs/models/lfm2-1.2b.mdx rename to lfm/models/lfm2-1.2b.mdx index 72a379d..7af719b 100644 --- a/docs/models/lfm2-1.2b.mdx +++ b/lfm/models/lfm2-1.2b.mdx @@ -7,10 +7,10 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx"; import { TextVllm } from "/snippets/quickstart/text-vllm.mdx"; import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx"; -← Back to Text Models +← Back to Text Models -This model is deprecated. Use [LFM2.5-1.2B-Instruct](/docs/models/lfm25-1.2b-instruct) for improved performance. +This model is deprecated. Use [LFM2.5-1.2B-Instruct](/lfm/models/lfm25-1.2b-instruct) for improved performance. LFM2-1.2B was the original 1.2B parameter model in the LFM2 series. It has been superseded by LFM2.5-1.2B-Instruct, which offers better chat, instruction-following, and tool-calling performance. diff --git a/docs/models/lfm2-2.6b-exp.mdx b/lfm/models/lfm2-2.6b-exp.mdx similarity index 95% rename from docs/models/lfm2-2.6b-exp.mdx rename to lfm/models/lfm2-2.6b-exp.mdx index 967c712..552a05d 100644 --- a/docs/models/lfm2-2.6b-exp.mdx +++ b/lfm/models/lfm2-2.6b-exp.mdx @@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx"; import { TextVllm } from "/snippets/quickstart/text-vllm.mdx"; import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx"; -← Back to Text Models +← Back to Text Models LFM2-2.6B-Exp is an experimental checkpoint of LFM2-2.6B with RL-only post-training, delivering improved performance on math and reasoning benchmarks. Use this model when you need stronger analytical capabilities. diff --git a/docs/models/lfm2-2.6b-transcript.mdx b/lfm/models/lfm2-2.6b-transcript.mdx similarity index 98% rename from docs/models/lfm2-2.6b-transcript.mdx rename to lfm/models/lfm2-2.6b-transcript.mdx index 6a71d13..c1afe91 100644 --- a/docs/models/lfm2-2.6b-transcript.mdx +++ b/lfm/models/lfm2-2.6b-transcript.mdx @@ -3,7 +3,7 @@ title: "LFM2-2.6B-Transcript" description: "2.6B parameter model for private, on-device meeting summarization" --- -← Back to Liquid Nanos +← Back to Liquid Nanos LFM2-2.6B-Transcript is designed for private, on-device meeting summarization from transcripts. It generates executive summaries, detailed summaries, action items, key decisions, and participant lists. diff --git a/docs/models/lfm2-2.6b.mdx b/lfm/models/lfm2-2.6b.mdx similarity index 96% rename from docs/models/lfm2-2.6b.mdx rename to lfm/models/lfm2-2.6b.mdx index 4c3f768..4dd6635 100644 --- a/docs/models/lfm2-2.6b.mdx +++ b/lfm/models/lfm2-2.6b.mdx @@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx"; import { TextVllm } from "/snippets/quickstart/text-vllm.mdx"; import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx"; -← Back to Text Models +← Back to Text Models LFM2-2.6B is a versatile mid-sized model delivering strong performance across chat, reasoning, and tool-calling tasks. Optimized for deployment on consumer devices including phones and laptops. diff --git a/docs/models/lfm2-350m-enjp-mt.mdx b/lfm/models/lfm2-350m-enjp-mt.mdx similarity index 97% rename from docs/models/lfm2-350m-enjp-mt.mdx rename to lfm/models/lfm2-350m-enjp-mt.mdx index 1b153fb..3d5efd6 100644 --- a/docs/models/lfm2-350m-enjp-mt.mdx +++ b/lfm/models/lfm2-350m-enjp-mt.mdx @@ -3,7 +3,7 @@ title: "LFM2-350M-ENJP-MT" description: "350M parameter model for bidirectional English-Japanese translation" --- -← Back to Liquid Nanos +← Back to Liquid Nanos LFM2-350M-ENJP-MT is a specialized translation model for near real-time bidirectional Japanese/English translation. Optimized for short-to-medium text with low latency. diff --git a/docs/models/lfm2-350m-extract.mdx b/lfm/models/lfm2-350m-extract.mdx similarity index 97% rename from docs/models/lfm2-350m-extract.mdx rename to lfm/models/lfm2-350m-extract.mdx index fff4455..9162c46 100644 --- a/docs/models/lfm2-350m-extract.mdx +++ b/lfm/models/lfm2-350m-extract.mdx @@ -3,7 +3,7 @@ title: "LFM2-350M-Extract" description: "350M parameter extraction model for edge deployment" --- -← Back to Liquid Nanos +← Back to Liquid Nanos LFM2-350M-Extract is the fastest extraction model, optimized for edge deployment with strict memory and compute constraints. It delivers structured data extraction with minimal latency. diff --git a/docs/models/lfm2-350m-math.mdx b/lfm/models/lfm2-350m-math.mdx similarity index 96% rename from docs/models/lfm2-350m-math.mdx rename to lfm/models/lfm2-350m-math.mdx index a053910..62db0e1 100644 --- a/docs/models/lfm2-350m-math.mdx +++ b/lfm/models/lfm2-350m-math.mdx @@ -3,7 +3,7 @@ title: "LFM2-350M-Math" description: "350M parameter model for math problem solving" --- -← Back to Liquid Nanos +← Back to Liquid Nanos LFM2-350M-Math is a tiny reasoning model optimized for mathematical problem solving. It provides step-by-step solutions while maintaining a small footprint for edge deployment. diff --git a/docs/models/lfm2-350m-pii-extract-jp.mdx b/lfm/models/lfm2-350m-pii-extract-jp.mdx similarity index 97% rename from docs/models/lfm2-350m-pii-extract-jp.mdx rename to lfm/models/lfm2-350m-pii-extract-jp.mdx index cf70d24..e24748c 100644 --- a/docs/models/lfm2-350m-pii-extract-jp.mdx +++ b/lfm/models/lfm2-350m-pii-extract-jp.mdx @@ -3,7 +3,7 @@ title: "LFM2-350M-PII-Extract-JP" description: "350M parameter model for Japanese PII detection and extraction" --- -← Back to Liquid Nanos +← Back to Liquid Nanos LFM2-350M-PII-Extract-JP extracts personally identifiable information (PII) from Japanese text as structured JSON. Output can be used to mask sensitive information on-device for privacy-preserving applications. diff --git a/docs/models/lfm2-350m.mdx b/lfm/models/lfm2-350m.mdx similarity index 96% rename from docs/models/lfm2-350m.mdx rename to lfm/models/lfm2-350m.mdx index 5176bd2..00dae65 100644 --- a/docs/models/lfm2-350m.mdx +++ b/lfm/models/lfm2-350m.mdx @@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx"; import { TextVllm } from "/snippets/quickstart/text-vllm.mdx"; import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx"; -← Back to Text Models +← Back to Text Models LFM2-350M is Liquid AI's smallest text model, designed for edge devices with strict memory and compute constraints. Delivers surprisingly strong performance for its size, making it ideal for low-latency applications. diff --git a/docs/models/lfm2-700m.mdx b/lfm/models/lfm2-700m.mdx similarity index 96% rename from docs/models/lfm2-700m.mdx rename to lfm/models/lfm2-700m.mdx index 854ed38..8cdb0c2 100644 --- a/docs/models/lfm2-700m.mdx +++ b/lfm/models/lfm2-700m.mdx @@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx"; import { TextVllm } from "/snippets/quickstart/text-vllm.mdx"; import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx"; -← Back to Text Models +← Back to Text Models LFM2-700M is a compact model balancing capability and efficiency. Suitable for deployment on a wide range of devices including phones, tablets, and laptops with limited resources. diff --git a/docs/models/lfm2-8b-a1b.mdx b/lfm/models/lfm2-8b-a1b.mdx similarity index 96% rename from docs/models/lfm2-8b-a1b.mdx rename to lfm/models/lfm2-8b-a1b.mdx index 8a40718..9dc8964 100644 --- a/docs/models/lfm2-8b-a1b.mdx +++ b/lfm/models/lfm2-8b-a1b.mdx @@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx"; import { TextVllm } from "/snippets/quickstart/text-vllm.mdx"; import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx"; -← Back to Text Models +← Back to Text Models LFM2-8B-A1B is Liquid AI's Mixture-of-Experts model, combining 8B total parameters with only 1.5B active parameters per forward pass. This delivers the quality of larger models with the speed and efficiency of smaller onesβ€”ideal for on-device deployment. diff --git a/docs/models/lfm2-audio-1.5b.mdx b/lfm/models/lfm2-audio-1.5b.mdx similarity index 94% rename from docs/models/lfm2-audio-1.5b.mdx rename to lfm/models/lfm2-audio-1.5b.mdx index 6cc4296..0b5a6f2 100644 --- a/docs/models/lfm2-audio-1.5b.mdx +++ b/lfm/models/lfm2-audio-1.5b.mdx @@ -3,10 +3,10 @@ title: "LFM2-Audio-1.5B" description: "1.5B audio model (deprecated - use LFM2.5-Audio-1.5B instead)" --- -← Back to Audio Models +← Back to Audio Models -This model is deprecated. Use [LFM2.5-Audio-1.5B](/docs/models/lfm25-audio-1.5b) for improved ASR, TTS, and CPU-friendly inference. +This model is deprecated. Use [LFM2.5-Audio-1.5B](/lfm/models/lfm25-audio-1.5b) for improved ASR, TTS, and CPU-friendly inference. LFM2-Audio-1.5B was the original fully interleaved audio/text model. It has been superseded by LFM2.5-Audio-1.5B, which features a custom LFM-based audio detokenizer and improved performance. diff --git a/docs/models/lfm2-colbert-350m.mdx b/lfm/models/lfm2-colbert-350m.mdx similarity index 97% rename from docs/models/lfm2-colbert-350m.mdx rename to lfm/models/lfm2-colbert-350m.mdx index 7150405..b691250 100644 --- a/docs/models/lfm2-colbert-350m.mdx +++ b/lfm/models/lfm2-colbert-350m.mdx @@ -3,7 +3,7 @@ title: "LFM2-ColBERT-350M" description: "350M parameter ColBERT model for multi-language document retrieval and reranking" --- -← Back to Liquid Nanos +← Back to Liquid Nanos LFM2-ColBERT-350M generates dense embeddings for document retrieval and reranking using the ColBERT late-interaction architecture. It supports 8 languages and excels at semantic search tasks. diff --git a/docs/models/lfm2-vl-1.6b.mdx b/lfm/models/lfm2-vl-1.6b.mdx similarity index 91% rename from docs/models/lfm2-vl-1.6b.mdx rename to lfm/models/lfm2-vl-1.6b.mdx index c617000..dc4ff52 100644 --- a/docs/models/lfm2-vl-1.6b.mdx +++ b/lfm/models/lfm2-vl-1.6b.mdx @@ -7,10 +7,10 @@ import { VlTransformers } from "/snippets/quickstart/vl-transformers.mdx"; import { VlVllm } from "/snippets/quickstart/vl-vllm.mdx"; import { VlLlamacpp } from "/snippets/quickstart/vl-llamacpp.mdx"; -← Back to Vision Models +← Back to Vision Models -This model is deprecated. Use [LFM2.5-VL-1.6B](/docs/models/lfm25-vl-1.6b) for improved performance. +This model is deprecated. Use [LFM2.5-VL-1.6B](/lfm/models/lfm25-vl-1.6b) for improved performance. LFM2-VL-1.6B was the original 1.6B vision-language model. It has been superseded by LFM2.5-VL-1.6B, which offers better visual understanding and reasoning through extended reinforcement learning. diff --git a/docs/models/lfm2-vl-3b.mdx b/lfm/models/lfm2-vl-3b.mdx similarity index 96% rename from docs/models/lfm2-vl-3b.mdx rename to lfm/models/lfm2-vl-3b.mdx index 03ed3f2..a53db13 100644 --- a/docs/models/lfm2-vl-3b.mdx +++ b/lfm/models/lfm2-vl-3b.mdx @@ -7,7 +7,7 @@ import { VlTransformers } from "/snippets/quickstart/vl-transformers.mdx"; import { VlVllm } from "/snippets/quickstart/vl-vllm.mdx"; import { VlLlamacpp } from "/snippets/quickstart/vl-llamacpp.mdx"; -← Back to Vision Models +← Back to Vision Models LFM2-VL-3B is Liquid AI's highest-capacity multimodal model, delivering enhanced visual reasoning and detailed image understanding. Ideal for complex vision tasks requiring deeper comprehension. diff --git a/docs/models/lfm2-vl-450m.mdx b/lfm/models/lfm2-vl-450m.mdx similarity index 96% rename from docs/models/lfm2-vl-450m.mdx rename to lfm/models/lfm2-vl-450m.mdx index 3009611..6c82036 100644 --- a/docs/models/lfm2-vl-450m.mdx +++ b/lfm/models/lfm2-vl-450m.mdx @@ -7,7 +7,7 @@ import { VlTransformers } from "/snippets/quickstart/vl-transformers.mdx"; import { VlVllm } from "/snippets/quickstart/vl-vllm.mdx"; import { VlLlamacpp } from "/snippets/quickstart/vl-llamacpp.mdx"; -← Back to Vision Models +← Back to Vision Models LFM2-VL-450M is Liquid AI's smallest vision-language model, designed for edge deployment with strict memory and compute constraints. Delivers fast multimodal inference on resource-limited devices. diff --git a/docs/models/lfm25-1.2b-base.mdx b/lfm/models/lfm25-1.2b-base.mdx similarity index 97% rename from docs/models/lfm25-1.2b-base.mdx rename to lfm/models/lfm25-1.2b-base.mdx index 52bdaef..3dc3bfb 100644 --- a/docs/models/lfm25-1.2b-base.mdx +++ b/lfm/models/lfm25-1.2b-base.mdx @@ -3,7 +3,7 @@ title: "LFM2.5-1.2B-Base" description: "Pre-trained 1.2B parameter base model for fine-tuning and custom applications" --- -← Back to Text Models +← Back to Text Models LFM2.5-1.2B-Base is the pre-trained foundation model for the LFM2.5 series. Ideal for fine-tuning on custom datasets or building specialized checkpoints. Not instruction-tunedβ€”use LFM2.5-1.2B-Instruct for chat applications. diff --git a/docs/models/lfm25-1.2b-instruct.mdx b/lfm/models/lfm25-1.2b-instruct.mdx similarity index 96% rename from docs/models/lfm25-1.2b-instruct.mdx rename to lfm/models/lfm25-1.2b-instruct.mdx index 155a8f8..d32ee03 100644 --- a/docs/models/lfm25-1.2b-instruct.mdx +++ b/lfm/models/lfm25-1.2b-instruct.mdx @@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx"; import { TextVllm } from "/snippets/quickstart/text-vllm.mdx"; import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx"; -← Back to Text Models +← Back to Text Models LFM2.5-1.2B-Instruct is Liquid AI's flagship instruction-tuned model, delivering exceptional performance for chat, instruction-following, and tool-calling tasks. Built on the LFM2.5 architecture with extended pre-training and reinforcement learning. diff --git a/docs/models/lfm25-1.2b-jp.mdx b/lfm/models/lfm25-1.2b-jp.mdx similarity index 96% rename from docs/models/lfm25-1.2b-jp.mdx rename to lfm/models/lfm25-1.2b-jp.mdx index 4bd5fdd..734e985 100644 --- a/docs/models/lfm25-1.2b-jp.mdx +++ b/lfm/models/lfm25-1.2b-jp.mdx @@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx"; import { TextVllm } from "/snippets/quickstart/text-vllm.mdx"; import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx"; -← Back to Text Models +← Back to Text Models LFM2.5-1.2B-JP is fine-tuned for Japanese language tasks, delivering high-quality Japanese text generation, translation, and conversation. Built on LFM2.5 with specialized Japanese training data. diff --git a/docs/models/lfm25-1.2b-thinking.mdx b/lfm/models/lfm25-1.2b-thinking.mdx similarity index 96% rename from docs/models/lfm25-1.2b-thinking.mdx rename to lfm/models/lfm25-1.2b-thinking.mdx index fa5476b..7627d90 100644 --- a/docs/models/lfm25-1.2b-thinking.mdx +++ b/lfm/models/lfm25-1.2b-thinking.mdx @@ -7,7 +7,7 @@ import { TextTransformers } from "/snippets/quickstart/text-transformers.mdx"; import { TextVllm } from "/snippets/quickstart/text-vllm.mdx"; import { TextLlamacpp } from "/snippets/quickstart/text-llamacpp.mdx"; -← Back to Text Models +← Back to Text Models LFM2.5-1.2B-Thinking is optimized for reasoning tasks, delivering strong performance on math, logic, and multi-step problem-solving. Built on the LFM2.5 architecture with specialized training for chain-of-thought reasoning. diff --git a/docs/models/lfm25-audio-1.5b.mdx b/lfm/models/lfm25-audio-1.5b.mdx similarity index 98% rename from docs/models/lfm25-audio-1.5b.mdx rename to lfm/models/lfm25-audio-1.5b.mdx index 25ad96d..3b47d04 100644 --- a/docs/models/lfm25-audio-1.5b.mdx +++ b/lfm/models/lfm25-audio-1.5b.mdx @@ -3,7 +3,7 @@ title: "LFM2.5-Audio-1.5B" description: "1.5B fully interleaved audio/text model for TTS, ASR, and voice chat" --- -← Back to Audio Models +← Back to Audio Models LFM2.5-Audio-1.5B is Liquid AI's flagship audio model, featuring a custom LFM-based audio detokenizer. It delivers natural speech synthesis, multilingual speech recognition, and fully interleaved voice chat with reasoning capabilities in a single compact model. diff --git a/docs/models/lfm25-vl-1.6b.mdx b/lfm/models/lfm25-vl-1.6b.mdx similarity index 96% rename from docs/models/lfm25-vl-1.6b.mdx rename to lfm/models/lfm25-vl-1.6b.mdx index aeb81dd..527151c 100644 --- a/docs/models/lfm25-vl-1.6b.mdx +++ b/lfm/models/lfm25-vl-1.6b.mdx @@ -7,7 +7,7 @@ import { VlTransformers } from "/snippets/quickstart/vl-transformers.mdx"; import { VlVllm } from "/snippets/quickstart/vl-vllm.mdx"; import { VlLlamacpp } from "/snippets/quickstart/vl-llamacpp.mdx"; -← Back to Vision Models +← Back to Vision Models LFM2.5-VL-1.6B is Liquid AI's flagship vision-language model, delivering exceptional performance on image understanding, visual reasoning, and multimodal tasks. Built on LFM2.5 with a dynamic SigLIP2 image encoder. diff --git a/docs/models/liquid-nanos.mdx b/lfm/models/liquid-nanos.mdx similarity index 79% rename from docs/models/liquid-nanos.mdx rename to lfm/models/liquid-nanos.mdx index eff2033..b5dcba7 100644 --- a/docs/models/liquid-nanos.mdx +++ b/lfm/models/liquid-nanos.mdx @@ -32,55 +32,55 @@ icon: "sparkles" - + 1.2B Β· Extraction Extract structured JSON from unstructured documents. - + 350M Β· Extraction Fastest extraction model for edge deployment. - + 350M Β· Extraction Japanese PII detection into structured JSON. - + 2.6B Β· Summarization Private, on-device meeting summarization from transcripts. - + 1.2B Β· RAG Answer questions grounded in provided context documents. - + 350M Β· Retrieval Multi-language document embeddings for retrieval and reranking. - + 350M Β· Translation Near real-time bidirectional Japanese/English translation. - + 350M Β· Reasoning Tiny reasoning model for math problem solving. - + 1.2B Β· Deprecated Use LFM2.5-1.2B-Instruct for tool calling instead. diff --git a/docs/models/text-models.mdx b/lfm/models/text-models.mdx similarity index 86% rename from docs/models/text-models.mdx rename to lfm/models/text-models.mdx index 36c00f2..f65d210 100644 --- a/docs/models/text-models.mdx +++ b/lfm/models/text-models.mdx @@ -32,25 +32,25 @@ icon: "comment" - + 1.2B Β· Recommended Instruction-tuned for chat. Best for most use cases. - + 1.2B Β· Reasoning Optimized for math and logical problem-solving. - + 1.2B Β· Pre-trained Base model for finetuning or custom checkpoints. - + 1.2B Β· Japanese Fine-tuned model for high-quality Japanese text generation. @@ -62,37 +62,37 @@ icon: "comment" - + 8B Β· 1.5B active Β· MoE Mixture-of-experts model for on-device speed and quality. - + 2.6B Highly capable model for deployment on most phones and laptops. - + 2.6B RL-only post-trained checkpoint for improved math and reasoning. - + 1.2B Β· Deprecated Use the new LFM2.5-1.2B-Instruct checkpoint instead. - + 700M Mid sized model for deploying on most devices. - + 350M Β· Fastest Our smallest model for edge devices and low latency deployments. diff --git a/docs/models/vision-models.mdx b/lfm/models/vision-models.mdx similarity index 93% rename from docs/models/vision-models.mdx rename to lfm/models/vision-models.mdx index f9bec43..1e92172 100644 --- a/docs/models/vision-models.mdx +++ b/lfm/models/vision-models.mdx @@ -32,7 +32,7 @@ icon: "eye" - + 1.6B Β· Recommended Best vision model for most use cases. Fast and accurate. @@ -44,19 +44,19 @@ icon: "eye" - + 3B Highest-capacity multimodal model with enhanced visual reasoning. - + 1.6B Β· Deprecated Use the new LFM2.5-VL-1.6B checkpoint instead. - + 450M Β· Fastest Compact multimodal model for edge deployment and fast inference. diff --git a/docs/getting-started/connect-ai-tools.mdx b/snippets/connect-ai-tools.mdx similarity index 89% rename from docs/getting-started/connect-ai-tools.mdx rename to snippets/connect-ai-tools.mdx index 44a6fe0..7f80b29 100644 --- a/docs/getting-started/connect-ai-tools.mdx +++ b/snippets/connect-ai-tools.mdx @@ -1,8 +1,3 @@ ---- -title: "Connect AI Tools" -description: "Connect your AI coding tools to Liquid Docs via MCP for live, queryable access to documentation" ---- - ## What is MCP? The Model Context Protocol (MCP) is an open standard that gives AI applications a standardized way to connect to external data sources and tools. By connecting your AI coding tool to Liquid docs via MCP, you're giving it live, queryable access to the complete documentation: not a snapshot, not a cached file, but a real-time search against our official documentation. @@ -111,10 +106,10 @@ You're all set! Cursor now has real-time access to Liquid AI documentation. ## Next Steps - + Browse our collection of language models - + Get started with the LEAP SDK