Liquid4All · alay2shah · Feb 6, 2026
@@ -1,6 +1,6 @@
 ---
 title: "vLLM"
 description: "vLLM is a high-throughput and memory-efficient inference engine for LLMs. It supports efficient serving with PagedAttention, continuous batching, and optimized CUDA kernels."
 ---

 <Tip>
@@ -17,11 +17,24 @@
 
 ## Installation
 
-You need to install [`vLLM`](https://github.com/vllm-project/vllm) v0.14 or a more recent version:
+<Tabs>
+  <Tab title="pip">
+    Install [`vLLM`](https://github.com/vllm-project/vllm) v0.14 or a more recent version:
 
-```bash
-uv pip install vllm==0.14
-```
+    ```bash
+    uv pip install vllm==0.14
+    ```
+  </Tab>
+  <Tab title="Docker">
+    vLLM provides a prebuilt Docker image that serves an OpenAI-compatible API:
+
+    ```bash
+    docker pull vllm/vllm-openai:latest
+    ```
+
+    This image requires NVIDIA GPU access. See the [OpenAI-Compatible Server](#openai-compatible-server) section below for the full `docker run` command.
+  </Tab>
+</Tabs>
 
 ## Basic Usage
 
@@ -52,7 +65,7 @@
 Control text generation behavior using [`SamplingParams`](https://docs.vllm.ai/en/v0.4.1/dev/sampling_params.html). Key parameters:

 * **`temperature`** (`float`, default 1.0): Controls randomness (0.0 = deterministic, higher = more random). Typical range: 0.1-2.0
 * **`top_p`** (`float`, default 1.0): Nucleus sampling - limits to tokens with cumulative probability ≤ top\_p. Typical range: 0.1-1.0
 * **`top_k`** (`int`, default -1): Limits to top-k most probable tokens (-1 = disabled). Typical range: 1-100
 * **`min_p`** (`float`): Minimum token probability threshold. Typical range: 0.01-0.2
 * **`max_tokens`** (`int`): Maximum number of tokens to generate
@@ -108,19 +121,42 @@
 
 ## OpenAI-Compatible Server
 
-vLLM can serve models through an OpenAI-compatible API, allowing you to use existing OpenAI client libraries:
-
-```bash
-vllm serve LiquidAI/LFM2.5-1.2B-Instruct \
-    --host 0.0.0.0 \
-    --port 8000 \
-    --dtype auto
-```
-
-Optional parameters:
-
-* `--max-model-len L`: Set maximum context length
-* `--gpu-memory-utilization 0.9`: Set GPU memory usage (0.0-1.0)
+vLLM can serve models through an OpenAI-compatible API, allowing you to use existing OpenAI client libraries.
+
+<Tabs>
+  <Tab title="vllm serve">
+    ```bash
+    vllm serve LiquidAI/LFM2.5-1.2B-Instruct \
+        --host 0.0.0.0 \
+        --port 8000 \
+        --dtype auto
+    ```
+
+    Optional parameters:
+
+    * `--max-model-len L`: Set maximum context length
+    * `--gpu-memory-utilization 0.9`: Set GPU memory usage (0.0-1.0)
+  </Tab>
+  <Tab title="Docker">
+    ```bash
+    docker run --runtime nvidia --gpus all \
+        -v ~/.cache/huggingface:/root/.cache/huggingface \
+        --env "HF_TOKEN=$HF_TOKEN" \
+        -p 8000:8000 \
+        --ipc=host \
+        vllm/vllm-openai:latest \
+        --model LiquidAI/LFM2.5-1.2B-Instruct
+    ```
+
+    Key flags:
+    * `--runtime nvidia --gpus all`: GPU access (required)
+    * `--ipc=host`: Shared memory for tensor parallelism
+    * `-v ~/.cache/huggingface:/root/.cache/huggingface`: Cache models on host
+    * `HF_TOKEN`: Set this env var if using gated models
+
+    **Note:** The Docker image does not include optional dependencies. If you need them, build a custom image from the [vLLM Dockerfile](https://docs.vllm.ai/en/stable/deployment/docker/).
+  </Tab>
+</Tabs>
 
 ### Chat Completions
 
@@ -185,7 +221,7 @@

 ### Installation for Vision Models

 To use LFM Vision Models with vLLM, install the precompiled wheel along with the required transformers version:

 ```bash
 VLLM_PRECOMPILED_WHEEL_COMMIT=72506c98349d6bcd32b4e33eec7b5513453c1502 VLLM_USE_PRECOMPILED=1 uv pip install git+https://github.com/vllm-project/vllm.git