Add Sglang deployment instructions#48
Conversation
|
cc @tugot17 |
docs/inference/sglang.mdx
Outdated
|
|
||
| * `--chunked-prefill-size -1`: Disables chunked prefill for lower latency | ||
|
|
||
| ### Ultra Low Latency on Blackwell (B300) |
There was a problem hiding this comment.
I think this could be a seperate section overall. Low latency, and all of the flags
docs/inference/sglang.mdx
Outdated
|
|
||
| For more details on tool parsing configuration, see the [SGLang Tool Parser documentation](https://docs.sglang.io/advanced_features/tool_parser.html). | ||
|
|
||
| ## Vision Models |
There was a problem hiding this comment.
this is not yet officially supported (not merged into sglang), we should dorp it
There was a problem hiding this comment.
We can just merge this PR when it is merged. Thoughts
There was a problem hiding this comment.
Let's drop the vision models for now please, we will do this update step by step, and merg of the vision models will take a while
Signed-off-by: vincentzed <[email protected]> WIP Signed-off-by: vincentzed <[email protected]>
Signed-off-by: vincentzed <[email protected]>
|
Changes as req'ed, done in b8dacc3 |
|
I'll take a look at this and make sure formatting is standard to rest of docs |
- Add supported models table (dense, MoE coming in 0.5.9, vision not yet) - Add install-from-main instructions for MoE support - Consolidate launch command with --tool-call-parser lfm2 by default - Move Docker under Launching the Server section - Replace verbose chat examples with concise curl + Python tool calling - Simplify low latency section with key metrics only - Fold precision info into a Note
tugot17
left a comment
There was a problem hiding this comment.
Added my changes, now i think it is fine and we can merge this.
docs/inference/sglang.mdx
Outdated
|
|
||
| For more details on tool parsing configuration, see the [SGLang Tool Parser documentation](https://docs.sglang.io/advanced_features/tool_parser.html). | ||
|
|
||
| ## Vision Models |
There was a problem hiding this comment.
Let's drop the vision models for now please, we will do this update step by step, and merg of the vision models will take a while
docs/inference/sglang.mdx
Outdated
| python3 -m sglang.launch_server \ | ||
| --model LiquidAI/LFM2.5-1.2B-Instruct \ | ||
| --host 0.0.0.0 \ | ||
| --port 30000 |
There was a problem hiding this comment.
--tool-call-parser lfm2
Let's add this here, and add an example tool call in the Chat completions section, and drop the seperate Tool Calling secion?
|
@Paulescu could you give it a "go"? |
- Add Tip callout at top with concise use-case summary - Use Tabs (Python/Docker) for server launch section - Show Python example directly under Usage, curl in Accordion - Trim redundant pip/uv install lines from MoE install Co-Authored-By: Claude Opus 4.6 <[email protected]>
No description provided.