Releases · THUDM/slime

18 Jan 04:54

zhuzilin

v0.2.2

2418a78

v0.2.2 Latest

Latest

v0.2.2 is here! Thanks to everyone who contributed to this release.

Major Updates

In addition to multiple memory and performance improvements, v0.2.2 adds support for:

Int4-QAT training
Full R3 (Rollout Routing Replay) support with DeepEP and MTP
Dependency upgrades: SGLang v0.5.7 and the Megatron dev branch

What's Changed

add ckpt load save ci by @lilei199908 in #1104
Add --rollout-all-samples-process-path for RLVE by @zhuzilin in #1107
feat: support Qwen3 Moe BackEnd Kernel by @attack204 in #1071
fix max response/context/prompt len by @lilei199908 in #1110
fix max len by @lilei199908 in #1112
[docker] remove amem and support deepep + r3 by @zhuzilin in #1115
[Fix] Fix early return in init rollout engine by @yitianlian in #1118
[Fix] Add sglang patch for weight version update by @yitianlian in #1119
fix: improve tokenization by @nanjiangwill in #1113
[Feature] Add CI test for weight version update by @yitianlian in #1120
[docker] optimize r3 with base64 encode by @zhuzilin in #1124
[docker] fix r3 gather buffer by @zhuzilin in #1129
[docker] support mtp for r3 by @zhuzilin in #1131
[Fix] Fix some bugs in retool example by @yitianlian in #1130
Add finalize_model_grads_with_empty_cache by @zhuzilin in #1133
Feat: add usage docs for fsdp by @lin0303-siyuan in #1092
Reserve more ports for new sglang dp attn impl by @zhuzilin in #1142
Blog: fix the path of the Blog's architecture image by @ShanningZhuang in #1125
Support async save and add extra save at the end of the training by @zhuzilin in #1143
fix: fix GemmeRMSNorm.forward() bug by @nanjiangwill in #1121
[WIP][FSDP] Support FSDP for Qwen3Next by @rucnyz in #1116
Megatron VLM Support (1/N) by @Zhuohao-Li in #1123
Update deprecated huggingface-cli and fix broken links by @Lyken17 in #1147
Added FSDP checkpoint handling to convert_torch_dist_to_hf.py by @cklxx in #1101
minor fix for megatron compatibility by @zhuzilin in #1149
Remove config_mapping to use megatron-bridge by @zhuzilin in #1166
Avoids repeated work. by @qqwqqw689 in #1163
Make tools/convert_torch_dist_to_hf.py not rely on megatron by @zhuzilin in #1167
support converting dpsk mtp layer by @zhuzilin in #1169
[FSDP] Add Masked importance sampling by @zijiexia in #1122
[TIS/MIS] fix and add better metric by @ChangyiYang in #1174
Fix optimizer schedule resume by @lr-tsinghua11 in #1152
[docker] upgrade to megatron dev branch by @zhuzilin in #1153
Minor fix by @lancerts in #1165
Fix forward of Qwen3VLTextRotaryEmbedding in Megatron-Bridge by @zhuzilin in #1179
Reuse the text llm config for qwen3 vl models by @zhuzilin in #1180
Don't save AutoBridge in args by @zhuzilin in #1181
[Fix] Fix port error in PD disaggregation setting by @yitianlian in #1175
Fix prompt type bug in generate_with_search within examples/search-r1 by @jiahe7ay in #1182
feat: support Qwen3 VL MoE by @nanjiangwill in #1171
[Fix] Minor fix by @yitianlian in #1183
Set parallel config for megatron bridge by @zhuzilin in #1184
Fix tools/convert_hf_to_torch_dist.py by @zhuzilin in #1186
Don't calculate entropy grad when coef is 0 by @zhuzilin in #1185
Disable routing replay for critic by @zhuzilin in #1187
Revert "Don't calculate entropy grad when coef is 0" by @zhuzilin in #1189
Fix qwen3next for megatron dev branch by @zhuzilin in #1190
fix: fix logging for rollout by @nanjiangwill in #1188
sync internal features by @zhuzilin in #1192
Fix check_weights api by @zhuzilin in #1194
Add --custom-rollout-log-function-path and --custom-eval-rollout-log-function-path by @zhuzilin in #1196
[Feature] Add more logging for health monitor by @yitianlian in #1195
fix: SFT tools support by @maoquan-ms in #1198
[Featuren] Change default value of rollout health check by @yitianlian in #1197
Megatron VLM Support w/ SFT (2/N) by @Zhuohao-Li in #1150
tiny fix for sft script after tokenizer improvement by @Zhuohao-Li in #1201
tests: add test for multi turn loss mask by @maoquan-ms in #1204
Always pass loss masks to model by @zhuzilin in #1205
[on-policy distillation] update reward function to fix potential token mismatches by @ahxt in #1128
Add ci for mtp by @zhuzilin in #1207
Fix mla tflops by @lilei199908 in #1209
update docs by @zhuzilin in #1211
update docs by @zhuzilin in #1214
[Feature] Support 0.3.0 sglang router for fault tolerance by @yitianlian in #1215
sync internal features by @zhuzilin in #1216
feat: add custom logic for processing list[list[Sample]] to training data by @nanjiangwill in #1218
add int4_quant cuda kernel by @Hyaloid in #1220
update doc by @zhuzilin in #1224
Improve AMD tutorial with complete model/data setup workflow by @Vivicai1005 in #1212
update megatron patch by @zhuzilin in #1228
sync from internal by @zhuzilin in #1229
fix model saving bug in megatron by @zhuzilin in #1230
add new status by @nanjiangwill in #1219
update customization docs by @nanjiangwill in #1233
Revert data processing of VLM by @zhuzilin in #1232
[VLM] optimize VLM processing by @nanjiangwill in #1234
feat: add custom pg_loss reducer by @ChangyiYang in #1235
fix: typo "sgalng" → "sglang" in ROCm Dockerfiles by @yurekami in #1282
sync bugfix from internal by @zhuzilin in #1284
sync internal bugfix by @zhuzilin in #1286
add bshd support by @yueming-yuan in #1285
[docker] fix bugs on pd disaggregation and add --disable-draft-cuda-graph by @zhuzilin in #1288
Add longest_effective_sample_tokens_per_sec metric by @zhuzilin in #1291
[fix] conditionally pass kwargs for megatron-bridge VLM by @yueming-yuan in #1290
[VLM] Bugfix: image_patch_size for vision preprocessing by @coding-famer in #1227
feat: add --custom-model-provider-path argument by @yurekami in #1239
[Feature/Fix] Support IPv6 host resolution and robust URI formatting by @Chen-GX in #859
Fix missing trust_remote_code in HfWeightIteratorBridge by @SwordFaith in #1287
fix: remove invalid None default and fix misleading underscore variable naming by @lancerts in #1283
fix: remove duplicate Megatron-LM installation in build_conda.sh by @yurekami in #1238
fix dev megatron ckpt save bugs by @lilei199908 in #1294
[Fix] fix image_patch_size in processing_utils by @coding-famer in #1295
support hicache for pd disaggregation by @zhuzilin in #1296
Optimize data.py for efficient data loading by @ppraneth in #696
Auto Sync Code by @miles-code-angel in #1303
[VLM] end2end geo3k multi-turn RL of VLM Recipe by @gxlvera in https://github.com/THUD...

Contributors

vpj, yzlnew, and 41 other contributors

Assets 2

12 Dec 13:02

zhuzilin

v0.2.1

0934a0e

v0.2.1

Thanks to the incredible support and contributions from our community — v0.2.1 is here!

Major Updates

VLM + FSDP: true on-policy training on Qwen3-VL (dense).
PD-disaggregation support during rollout
DP-attention support in rollout routing replay (R3)
Upgraded to SGLang v0.5.6

What's Changed

extract mla update weight logic out by @zhuzilin in #960
support do all evals together by @zhuzilin in #959
Add --rollout-sample-filter-path by @zhuzilin in #961
[FSDP] Optimize FSDP2 Model Loading with Rank-0 Broadcast by @Hecate0821 in #915
Add sample.remove_sample by @zhuzilin in #977
add --eval-max-prompt-len by @zhuzilin in #978
Add args check for max_context_len by @zhuzilin in #979
Remove hard coded balance_abs_threshold by @zhuzilin in #981
Tiny fix fp8_cast_bf16 not copying chat template by @fzyzcjy in #964
Super tiny install dnsutils in dockerfile by @fzyzcjy in #965
Super tiny sanity check checkpoint dir by @fzyzcjy in #966
Fix convert_hf_to_torch_dist OOM by @fzyzcjy in #967
Tiny support using environment variables in addition to arguments for all scripts by @fzyzcjy in #968
Super tiny increase default timeout sec by @fzyzcjy in #969
Fix random port in use error even though already have free port detection by @fzyzcjy in #970
Super tiny enable draft-weights-cpu-backup to avoid MTP acc len issue by @fzyzcjy in #971
Add generation function for benchmarking purpose by @fzyzcjy in #972
Support zero host or device memory waste for weight update by @fzyzcjy in #973
Add fp8 kv cache and tis in qwen3 30b a3b script by @fzyzcjy in #974
Add GB200, MTP, benchmark, fp8 rollout mode to glm script by @fzyzcjy in #975
[FSDP] Add private func indicator for better usage by @PopSoda2002 in #982
[Bugfix] Rename save model by @PopSoda2002 in #983
Fix: resolve variable shadowing bug in setup_model_and_optimizer by @fangzhensheng in #963
remove unnecessary optimizer init by @zhuzilin in #984
[release] bump to v0.2.0.post1 by @zhuzilin in #986
fix scaling of per token loss by @zhuzilin in #987
Add strands-agents example by @Lawhy in #976
Add nemo skills evaluation by @guapisolo in #989
[1/N] Tiny execute Ruff auto lint by @fzyzcjy in #991
[2/N] Tiny manually fix for Ruff default ruleset and add to pre-commit by @fzyzcjy in #992
[3/N] Enable B ruleset in Ruff by @fzyzcjy in #993
[4/N] Tiny enable UP ruleset in Ruff by @fzyzcjy in #994
Super tiny further fix lint error by @fzyzcjy in #995
Add DataSource and --data-source-path by @zhuzilin in #912
Fix per token loss scale and add e2e ci by @zhuzilin in #990
[FSDP] Add script for FSDP Qwen3-4B by @Hecate0821 in #988
Fixed bug in checking max_length for SFT #997 by @Surya-Gunukula in #998
[ci] Add CI to make sure all dense parallel gives the same grad norm by @zhuzilin in #1000
[Feature] Add off-policy sequence masking algorithm proposed in DeepSeek v3.2 by @yitianlian in #999
[FSDP][3/N] support true_on_policy training for FSDP2 by @zhuzilin in #1001
fix lint by @zhuzilin in #1002
Fix bare except clause and remove redundant computation in ppo_utils by @lancerts in #1007
fix: FSDP runnable for Qwen3-30b-a3b by @yueming-yuan in #1010
move tis function outside by @zhuzilin in #1014
Add backward impl for SiluAndMulFunction and MoeSumReduceFunction by @zhuzilin in #1015
refactor: expose compute_metrics_from_samples as public by @lancerts in #1012
Fix evaluation parameter parsing by @guapisolo in #1005
pre-commit run --all-files by @lancerts in #1021
fix: update deprecated import path in mcore2hf script by @Chen-GX in #1003
[FSDP] Add gpt oss 20b script by @PopSoda2002 in #996
Fix mimo speculative decoding oom by @guapisolo in #1024
[FSDP, VLM] feat: add vlm training for FSDP by @nanjiangwill in #501
[rollout] support disable trim samples when converting rollout samples to train datas by @GGGGGGXY in #1016
Backward compatible for older megatron version by @zhuzilin in #1028
extract all sglang deps in megatron actor to one file by @zhuzilin in #1029
feat: Add Unbiased KL Estimation from DeepSeek-V3.2 by @kekmodel in #1004
refactor: extract duplicated checkpoint interval logic into reusable helper by @lancerts in #1027
Fix typo in sglang_rollout.py comment by @ChenmienTan in #980
fix ci for nodes with proxy by @zhuzilin in #1035
[FSPP] fix args error in apply_fsdp2 function by @ChangyiYang in #1041
[FSDP] Support lr scheduler by @ChangyiYang in #1040
[Fix] Fix some bugs when on/offload model by @yitianlian in #1038
Improve debug output formatting in replay_reward_fn.py by @lancerts in #1033
Support pd disaggregation with p and d of same config by @zhuzilin in #1046
[rollout] Truncate last token for rollout routing replay by @Hecate0821 in #1045
fix: modernize type hint and add distributed init checks in utils by @lancerts in #1049
Fix the padding of rollout routing replay experts by @zhuzilin in #1052
update sglang to 0.5.6 by @lilei199908 in #1051
[docker] fix cudnn version by @zhuzilin in #1066
[docker] fix megatron cpu adam load issue by @zhuzilin in #1070
fix(examples): correct quotes and comment out ray cleanup commands in Qwen3-30B-A3B FP8 script by @pandengyao in #1069
Fix typos and improve clarity in documentation and code comments by @lancerts in #1067
fix: remove redundant gc.collect() and combine split f-strings by @lancerts in #1074
[FSDP, VLM] feat: true on policy for VLM by @nanjiangwill in #1056
[VLM, FSDP] Update Experiment Readme by @nanjiangwill in #1079
split train data in-advance to reduce communication by @zhuzilin in #1078
[Feature] PD Disaggregation Support by @yitianlian in #1080
fix raw_reward upload in fsdp by @zhuzilin in #1084
[FSDP][vlm] Add B200 doc by @PopSoda2002 in #1082
Add recompute loss function and enable by default by @zhuzilin in #1083
Empty cache before finalize_model_grads to prevent unexpected oom by @zhuzilin in #1086
Revert "Empty cache before finalize_model_grads to prevent unexpected oom" by @zhuzilin in #1087
Set --train-memory-margin-bytes to 1GB by default by @zhuzilin in #1088
set recompute_loss_function to false by default by @zhuzilin in #1089
[VLM] fix: fix non true-on-policy vlm regression by @nanjiangwill in #1093
fix_load_ckpt by @lilei199908 in #1095
fix actor init bugs by @lilei199908 in #1098
Fix gqa model tflops compute by @zhuzilin in #1099
Fix bug for convert_hf_to_torch_dist.py by @zhuzilin in #1100
[release] bump to v0.2.1 by @lilei199908 in #1096

New Contributors

@fangzhensheng made their first contribution in #963
@Lawhy made their first contribution in https://github.com/TH...

Contributors

fzyzcjy, lancerts, and 17 other contributors

Assets 2

01 Dec 04:17

zhuzilin

v0.2.0.post1

763f18d

v0.2.0.post1

Fix critical bug mentioned in #958.

What's Changed

extract mla update weight logic out by @zhuzilin in #960
support do all evals together by @zhuzilin in #959
Add --rollout-sample-filter-path by @zhuzilin in #961
[FSDP] Optimize FSDP2 Model Loading with Rank-0 Broadcast by @Hecate0821 in #915
Add sample.remove_sample by @zhuzilin in #977
add --eval-max-prompt-len by @zhuzilin in #978
Add args check for max_context_len by @zhuzilin in #979
Remove hard coded balance_abs_threshold by @zhuzilin in #981
Tiny fix fp8_cast_bf16 not copying chat template by @fzyzcjy in #964
Super tiny install dnsutils in dockerfile by @fzyzcjy in #965
Super tiny sanity check checkpoint dir by @fzyzcjy in #966
Fix convert_hf_to_torch_dist OOM by @fzyzcjy in #967
Tiny support using environment variables in addition to arguments for all scripts by @fzyzcjy in #968
Super tiny increase default timeout sec by @fzyzcjy in #969
Fix random port in use error even though already have free port detection by @fzyzcjy in #970
Super tiny enable draft-weights-cpu-backup to avoid MTP acc len issue by @fzyzcjy in #971
Add generation function for benchmarking purpose by @fzyzcjy in #972
Support zero host or device memory waste for weight update by @fzyzcjy in #973
Add fp8 kv cache and tis in qwen3 30b a3b script by @fzyzcjy in #974
Add GB200, MTP, benchmark, fp8 rollout mode to glm script by @fzyzcjy in #975
[FSDP] Add private func indicator for better usage by @PopSoda2002 in #982
[Bugfix] Rename save model by @PopSoda2002 in #983
Fix: resolve variable shadowing bug in setup_model_and_optimizer by @fangzhensheng in #963

New Contributors

@fangzhensheng made their first contribution in #963

Full Changelog: v0.2.0...v0.2.0.post1

Contributors

fzyzcjy, zhuzilin, and 3 other contributors

Assets 2

28 Nov 02:51

zhuzilin

v0.2.0

91acef0

v0.2.0

We are thrilled to announce the release of slime v0.2.0! Thanks to the incredible support and contributions from our community, slime has gained significant features and substantial performance enhancements in this version.

Major Updates

FSDP Backend: Introduced a fully Fully Sharded Data Parallel (FSDP) based training backend for improved scalability.
PPO Support: Added native support for Proximal Policy Optimization (PPO).
MTP Training: Enabled training of the MTP (Multi-Token Prediction) during Reinforcement Learning.
FP8 Full Stack: Support for both FP8 training and FP8 inference.
Train-Inference Mismatch: Alleviate or even eliminate train-inference mismatch
- Importance Sampling: Custom interface for train-infer importance sampling (e.g., MIS).
- Routing Replay: Added Rollout Routing Replay (R3) and Routing Replay (R2).
- True On-Policy Training: Enabled strictly on-policy training with dense models on the FSDP backend.
Performance Improvements
- Memory Optimization: CUDA Graphs offload, asystem-amem integration.
- Faster Weight Updates: Significantly accelerated FP8 weight updates.
Python-based Router: A new slime router implemented in pure Python for accessibility.
Fault Tolerance: Added robustness with fault tolerance for the rollout engines.
Custom Configs: Support for passing customized configurations via --config.
[Experimental] Checkpoint Loading: Added support for Megatron-bridge based checkpoint loading.
New Examples
- Fully Async Training
- Multi-Agent Scenarios
- On-Policy Distillation
- Retool

What's Changed

[Doc typo] Update amd_tutorial.md by @yushengsu-thu in #246
[bugfix] use fp32 for rollout_log_probs by @zhuzilin in #245
Complete the RayTrainGroup args string docs. by @MrAta in #248
Update speculative decoding doc and sglang patch by @guapisolo in #250
fix debug-rollout-only by @zyzshishui in #249
retool in one commit by @maocheng23 in #237
fix: modify the rotary-base of qwen-3b to 1000000 for consistency by @YuchenFan48 in #252
update logging and fix typo by @maocheng23 in #254
[bugfix] fix read data containing "tools" field by @Maybewuss in #255
Revert "[bugfix] fix read data containing "tools" field" by @zhuzilin in #256
add shell script for qwen3-32B task by @Gao016 in #253
docs: Fix custom interface documentation errors by @GeLee-Q in #251
[example] Add fully async example by @zhuzilin in #258
added sphinx-based documentation by @FrankLeeeee in #262
fixed build error for documentation by @FrankLeeeee in #263
[bugfix] Fix bugs on multi samples from one prompt (multi-agent) by @zhuzilin in #260
fixed sphinx configuration by @FrankLeeeee in #264
[bugfix] fix read data containing "tools" field by @Maybewuss in #259
add DeepWiki badge by @richardodliu in #265
[doc] add example doc to the website by @zhuzilin in #267
[doc] add blogs by @zhuzilin in #268
Update actor_group.py by @zlH518 in #266
[doc] prettify language convertion toggle by @zhuzilin in #270
[example] add an example for multi-agent rl by @yinpeisu in #269
[refactor] Add isort back and move global gloo to global util by @zhuzilin in #273
[refactor] remove over_sampling_filter and extract some functions by @zhuzilin in #278
[feat] init support for FSDP by @zhuzilin in #282
Chatbot entry for Sphinx style docs by @jhinpan in #284
Revert "Chatbot entry for Sphinx style docs" by @zhuzilin in #286
Remove get_rollout_data from actor_group by @MrAta in #285
Add docs logo by @jhinpan in #283
[Hardware] AMD Dockerfile update - support up to d4a7741 (Sep 6, 2025) by @yushengsu-thu in #307
[feat] init xtuner backend by @zhuzilin in #310
[docker] update to sglang 0.5.2rc2 by @zhuzilin in #313
Add model version attribute in each sample by @yitianlian in #271
[nfc] cleanup for weight_version by @zhuzilin in #314
Add raw reward metric in fdsp backend by @yitianlian in #315
fix: check args.save when save_interval is set. by @SanftMonster in #308
Fix comment for --load parameter in checkpoint configuration (Quick Start Doc) by @Arist12 in #306
[refactor] bind numa and rename num_gpus_per_node by @zhuzilin in #316
[xtuner] unroll TrainingWorker and TrainEngine by @zhuzilin in #322
[xtuner] add wandb by @zhuzilin in #324
[bugfix] fix no weight_version for aborted samples by @zhuzilin in #327
[FSDP] Verify FSDP backend availability via uv install / pip install by @Zhuohao-Li in #325
Add FSDP extras dependency and import test (#302) by @souhil25 in #303
fix: small bug fix in the rollout_buffer_example.sh by @rbao2018 in #328
[refactor] remove slime/backend/utils and extract slime_validate_args by @zhuzilin in #329
feat: auto configure megatron from hf config. by @SanftMonster in #312
Do not read ip if env is provided by @oraluben in #337
[rm_hub] fix ground_truth type error in grade_answer_verl by @GGGGGGXY in #336
[feat] use one global httpx.AsyncClient and remove --use-http2 by @zhuzilin in #338
[Refactor] Merge rollout controller into rollout manager by @PopSoda2002 in #304
add dockerfile and patch for b200 by @maocheng23 in #340
[feat] init support for PPO by @zhuzilin in #342
wrong expressions and typo by @ArtificialZeng in #343
Add basic VLM data pipeline by @ppraneth in #335
[FSDP] Add reference model support for correct KL loss computation #296 by @UbeCc in #344
fix incorrect sft loss mask for qwen3 thinking series models. by @luppx in #330
feature: ppo by @lilei199908 in #347
[FIX] NVLINK detection method in scripts by @JustinTong0323 in #356
fix lint by @JustinTong0323 in #358
[feat] add --critic-lr and --num-critic-only-steps by @zhuzilin in #350
[refactor] Add actor registry by @zhuzilin in #359
Added GB200 patches for SGLang v0.5.2 by @sam571128 in #360
[bugfix] fix the num_tokens used for per_token_loss in multi-turn training by @zhuzilin in #365
[Feature] Support token in token out for multi turn tasks by @yitianlian in #242
[router] support slime-router only by @zhuzilin in #366
[router] extract middleware folder by @zhuzilin in #367
[feat] support distributed post to enable more concurrent requests by @zhuzilin in #368
[FEAT] Deterministic rollout by @JustinTong0323 in #361
[reproducibility][docker] enable training reproducibility by @zhuzilin in #370
[feat] enable use_flattened_tensor_bucket with quantization config by @zhuzilin in #374
[fix] fix ppo bugs by @lilei199908 in #373
docs: add B200/H-series GPU hardware support information by @Williamren97 in #380
[model] fix run-qwen3-30B-A3B.sh by @yefei12 in #382
Enable loss mask for sft by @UbeCc in #377
[fix] fix paths in get_started.md by @hyleepp in #375
[FSDP] Data Packing Implementation in FSDP backend by @jhinpan in #321
[feat] add --use-routing-replay by @zhuzilin in #387
fix bug for convert Qwen3-235B-A22B HF model weight to Megatorn torch_dist format by @Gao016 in #386
[FSDP] Add update weight class from distributed by @pop...

Contributors

MrAta, oraluben, and 56 other contributors

Assets 2

31 Aug 16:35

zhuzilin

v0.1.0

261ecee

v0.1.0

Performance Optimizations

SGLang: FP8 + DeepEP + speculative decoding
Megatron: all parallel strategy supports (TP, PP, VPP, EP, CP, etc) + DeepEP + CPU Adam.
New Megatron offload strategy with better memory usage.
Faster weight updation.

New Algorithm Supports

GSPO
TIS
reinforce++ & reinforce++ base

Correctness

CI for E2E GLM4 9B adn Qwen3 30B-A3B training
CI for Build Conda environment

Assets 2

Releases: THUDM/slime

v0.2.2

Major Updates

What's Changed

Contributors

Uh oh!

v0.2.1

Major Updates

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0.post1

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0

Major Updates

What's Changed

Contributors

Uh oh!

v0.1.0

Performance Optimizations

New Algorithm Supports

Correctness

Uh oh!