modelscope/evalscope

Score8.3

Popularity64.0

Risknone

TierSilver

Score breakdown

Usefulness8.0

Novelty7.0

Momentum8.0

Maturity7.3

Open-source/build8.4

Evidence7.2

Workflow potential8.7

Setup ease6.2

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Great for teams deciding model changes by evidence, not claims, because the framework gives shared scoring patterns across text, retrieval, and multimodal tasks.

Who should use it

LLM product teamsML teams adding regression checksagents testing tool-use quality

Who should skip it

Skip if the source link, docs, or setup requirements do not match your workflow.

Risk explanation

Benchmark suites are only as good as their dataset coverage and annotation quality.; Large evaluation grids increase compute and CI time; sample carefully..

Evidence links

github.com

Closest alternatives / related signals

llm-evalrag-evalagent-evalinfrallm-bench