THUDM/slime - RepoRadar

Score8.5

Popularity7.1

Riskconditional

TierGold

Score breakdown

Usefulness8.0

Novelty8.0

Momentum7.0

Maturity6.8

Open-source/build8.4

Evidence7.2

Workflow potential9.6

Setup ease4.2

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for researchers and RL practitioners who want a production-tested post-training stack from the same lab behind GLM and want to scale agent RL beyond a single-node reference repo.

Who should use it

researchers scaling GRPO/DPO/RLOO runs beyond a single nodeapplied teams aligning or specializing LLMs with their own reward signalsplatform engineers building internal post-training pipelines on top of Megatron/DeepSpeedlabs looking for an Apache-2.0 alternative to commercial RLHF stacks

Who should skip it

Skip if the source link, docs, or setup requirements do not match your workflow.

Risk explanation

It trains and serves LLMs at scale, so model weights, training data, and reward signals can leave your environment if you wire it to external infra without isolation.

Evidence links

github.com

Closest alternatives / related signals

rlhfgrpopost-trainingalignmentmegatronllm-training