Item detail

benchflow-ai/skillsbench

SkillsBench is an Apache-2.0 benchmark suite for measuring how well skills work and how effectively agents use them, with reproducible task definitions and scoring harnesses designed for CI-style agent evaluation. It is the first public, opinionated harness aimed at the 'agent + skill' combination rather than the bare model.

Score7.7
Popularity72.0
Risknone
TierGold
Score breakdown
Usefulness7.0
Novelty8.0
Momentum7.0
Maturity7.6
Open-source/build8.4
Evidence7.2
Workflow potential8.8
Setup ease6.4

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for agent teams that ship a directory of skills and want a regression suite: pin the benchmark version, add it to CI for the agent under test, and treat drops in skill-level scores as build failures the same way you would for unit tests.

Who should use it

agent platform teamsskills authorsinternal developer platform ownersagent QA leads

Who should skip it

Skip if the source link, docs, or setup requirements do not match your workflow.

Risk explanation

benchmark scores can be gamed; pair SkillsBench with at least one real user task in your CI.

Evidence links

Closest alternatives / related signals

evaluationbenchmarkskillsagent-evalsskillsbenchci