Item detail

THUDM/slime

THUDM/slime is an Apache-2.0 LLM post-training framework for RL scaling, with first-class support for SFT, reward modeling, GRPO/DPO-style preference learning, and a distributed training path designed for high-throughput agentic rollouts.

Score8.5
Popularity7.1
Riskconditional
TierGold
Score breakdown
Usefulness8.0
Novelty8.0
Momentum7.0
Maturity6.8
Open-source/build8.4
Evidence7.2
Workflow potential9.6
Setup ease4.2

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for researchers and RL practitioners who want a production-tested post-training stack from the same lab behind GLM and want to scale agent RL beyond a single-node reference repo.

Who should use it

researchers scaling GRPO/DPO/RLOO runs beyond a single nodeapplied teams aligning or specializing LLMs with their own reward signalsplatform engineers building internal post-training pipelines on top of Megatron/DeepSpeedlabs looking for an Apache-2.0 alternative to commercial RLHF stacks

Who should skip it

Skip if the source link, docs, or setup requirements do not match your workflow.

Risk explanation

It trains and serves LLMs at scale, so model weights, training data, and reward signals can leave your environment if you wire it to external infra without isolation.

Evidence links

Closest alternatives / related signals

rlhfgrpopost-trainingalignmentmegatronllm-training