Skip to content

[Core] Add support for draft models for speculative decoding#535

Draft
TJ5 wants to merge 9 commits intoome-projects:mainfrom
TJ5:spec-decoding
Draft

[Core] Add support for draft models for speculative decoding#535
TJ5 wants to merge 9 commits intoome-projects:mainfrom
TJ5:spec-decoding

Conversation

@TJ5
Copy link
Copy Markdown

@TJ5 TJ5 commented Mar 5, 2026

What this PR does

Updates inference service to have a draft model as part of the spec, for use in speculative decoding. Configurations for use of the feature with nvidia/gpt-oss-120b-eagle3-long-context are added.

Why we need it

Speculative decoding increases inference speed, which is useful for long running agentic tasks. OME previously did not support deploying draft models, required for speculative decoding.

How to test

go test ./pkg/runtimeselector/...

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added documentation Documentation changes api API/Types changes in pkg/apis runtime Runtime configuration changes webhook Webhook changes models Model configuration changes inferenceservice InferenceService controller changes controller Controller changes tests Test changes config Configuration changes dependencies Dependency updates labels Mar 5, 2026
@TJ5 TJ5 changed the title [Core] Add support for draft modles for speculative decoding [Core] Add support for draft models for speculative decoding Mar 5, 2026
@TJ5 TJ5 marked this pull request as draft March 5, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API/Types changes in pkg/apis config Configuration changes controller Controller changes dependencies Dependency updates documentation Documentation changes inferenceservice InferenceService controller changes models Model configuration changes runtime Runtime configuration changes tests Test changes webhook Webhook changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant