GitHub - vllm-project/semantic-router: Intelligent Router for Mixture-of-Models

📚 Complete Documentation | 🚀 Quick Start | 📣 Blog | 📖 Publications

Latest News 🔥

[2025/12/15] New Blog: Token-Level Truth: Real-Time Hallucination Detection for Production LLMs 🚪
[2025/11/19] New Blog: Signal-Decision Driven Architecture: Reshaping Semantic Routing at Scale 🧠
[2025/11/03] Our paper Category-Aware Semantic Caching for Heterogeneous LLM Workloads published 📝
[2025/10/21] We announced the 2025 Q4 Roadmap: Journey to Iris 📅.
[2025/10/12] Our paper When to Reason: Semantic Router for vLLM accepted by NeurIPS 2025 MLForSys 🧠.
[2025/10/08] We announced the integration with vLLM Production Stack Team 👋.
[2025/10/01] We supported to deploy on Kubernetes 🌊.
[2025/09/01] We released the project officially: vLLM Semantic Router: Next Phase in LLM inference 🚀.

Innovations ✨

Intelligent Routing 🧠

Auto-Selection of Models and LoRA Adapters

A Mixture-of-Models (MoM) router that intelligently directs OpenAI API requests to the most suitable models or LoRA adapters from a defined pool based on Semantic Understanding of the request's intent (Complexity, Task, Tools).

Conceptually similar to Mixture-of-Experts (MoE) which lives within a model, this system selects the best entire model for the nature of the task.

As such, the overall inference accuracy is improved by using a pool of models that are better suited for different types of tasks:

The router is implemented in two ways:

Golang (with Rust FFI based on the candle rust ML framework)
Python Benchmarking will be conducted to determine the best implementation.

Request Flow

Auto-Selection of Tools

Select the tools to use based on the prompt, avoiding the use of tools that are not relevant to the prompt so as to reduce the number of prompt tokens and improve tool selection accuracy by the LLM.

Domain Aware System Prompts

Automatically inject specialized system prompts based on query classification, ensuring optimal model behavior for different domains (math, coding, business, etc.) without manual prompt engineering.

Domain Aware Similarity Caching ⚡️

Cache the semantic representation of the prompt so as to reduce the number of prompt tokens and improve the overall inference latency.

Enterprise Security 🔒

PII detection

Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the privacy of the user.

Prompt guard

Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving. Can be configured globally or at the category level for fine-grained security control.

Quick Start 🚀

Using VSR CLI (Recommended)

The vsr CLI tool provides a unified interface for managing the vLLM Semantic Router across all environments. It reduces setup time from hours to minutes with intelligent auto-detection, comprehensive diagnostics, and beautiful CLI output.

Installation

# Clone and build
cd semantic-router/src/semantic-router
make build-cli
export PATH=$PATH:$(pwd)/bin

# Verify installation
vsr --version

Get Started in 4 Commands

vsr init                    # Initialize configuration
make download-models        # Download AI models
vsr config validate        # Validate setup
vsr deploy docker          # Deploy with Docker Compose

Key Features

Multi-Environment Support: Deploy to Local, Docker, Kubernetes, or Helm
Model Management: Download, validate, list, and inspect models
Health Monitoring: Status checks, diagnostics, and health reports
Debug Tools: Interactive debugging and troubleshooting
Dashboard Integration: Auto-detect and open dashboard in browser
Enhanced Logging: Multi-environment log fetching with filtering

Common Commands

vsr status                  # Check deployment status
vsr logs --follow          # View logs in real-time
vsr health                 # Quick health check
vsr dashboard              # Open dashboard
vsr model list             # List available models
vsr debug                  # Run diagnostics
vsr upgrade docker         # Upgrade deployment
vsr undeploy docker        # Stop deployment

For complete CLI documentation, see src/semantic-router/cmd/vsr/README.md or Quick Start Guide.

Using Quickstart Script

Alternatively, get up and running in seconds with our interactive setup script:

bash ./scripts/quickstart.sh

This command will:

🔍 Check all prerequisites automatically
📦 Install HuggingFace CLI if needed
📥 Download all required AI models (~1.5GB)
🐳 Start all Docker services
⏳ Wait for services to become healthy
🌐 Show you all the endpoints and next steps

For detailed installation and configuration instructions, see the Complete Documentation.

Documentation 📖

For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:

👉 Complete Documentation at Read the Docs

The documentation includes:

Installation Guide - Complete setup instructions
System Architecture - Technical deep dive
Model Training - How classification models work
API Reference - Complete API documentation
Dashboard - vLLM Semantic Router Dashboard

Community 👋

For questions, feedback, or to contribute, please join #semantic-router channel in vLLM Slack.

Community Meetings 📅

We host bi-weekly community meetings to sync up with contributors across different time zones:

First Tuesday of the month: 9:00-10:00 AM EST (accommodates US EST, EU, and Asia Pacific contributors)
Third Tuesday of the month: 1:00-2:00 PM EST (accommodates US EST and California contributors)
Meeting Recordings: YouTube

Join us to discuss the latest developments, share ideas, and collaborate on the project!

Citation

If you find Semantic Router helpful in your research or projects, please consider citing it:

@misc{semanticrouter2025,
  title={vLLM Semantic Router},
  author={vLLM Semantic Router Team},
  year={2025},
  howpublished={\url{https://github.com/vllm-project/semantic-router}},
}

Star History 🔥

We opened the project at Aug 31, 2025. We love open source and collaboration ❤️

Sponsors 👋

We are grateful to our sponsors who support us:

AMD provides us with GPU resources and ROCm™ Software for training and researching the frontier router models, enhancing e2e testing, and building online models playground.

Name		Name	Last commit message	Last commit date
Latest commit History 748 Commits
.github		.github
bench		bench
candle-binding		candle-binding
config		config
dashboard		dashboard
deploy		deploy
e2e-tests		e2e-tests
e2e		e2e
examples		examples
perf		perf
scripts		scripts
src		src
tools		tools
website		website
.crd-ref-docs.yaml		.crd-ref-docs.yaml
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prowlabels.yaml		.prowlabels.yaml
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.extproc		Dockerfile.extproc
Dockerfile.extproc.cross		Dockerfile.extproc.cross
Dockerfile.model-downloader		Dockerfile.model-downloader
Dockerfile.precommit		Dockerfile.precommit
Dockerfile.stack		Dockerfile.stack
LICENSE		LICENSE
Makefile		Makefile
OWNER		OWNER
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Innovations ✨

Intelligent Routing 🧠

Auto-Selection of Models and LoRA Adapters

Request Flow

Auto-Selection of Tools

Domain Aware System Prompts

Domain Aware Similarity Caching ⚡️

Enterprise Security 🔒

PII detection

Prompt guard

Quick Start 🚀

Using VSR CLI (Recommended)

Installation

Get Started in 4 Commands

Key Features

Common Commands

Using Quickstart Script

Documentation 📖

Community 👋

Community Meetings 📅

Citation

Star History 🔥

Sponsors 👋

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 50

Languages

License

vllm-project/semantic-router

Folders and files

Latest commit

History

Repository files navigation

Innovations ✨

Intelligent Routing 🧠

Auto-Selection of Models and LoRA Adapters

Request Flow

Auto-Selection of Tools

Domain Aware System Prompts

Domain Aware Similarity Caching ⚡️

Enterprise Security 🔒

PII detection

Prompt guard

Quick Start 🚀

Using VSR CLI (Recommended)

Installation

Get Started in 4 Commands

Key Features

Common Commands

Using Quickstart Script

Documentation 📖

Community 👋

Community Meetings 📅

Citation

Star History 🔥

Sponsors 👋

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 50

Languages

Packages