boheling/skillbench: AI tool review & score

Score8.1

Popularity1.0

Riskconditional

TierGold

Score breakdown

Usefulness8.0

Novelty8.0

Momentum5.0

Maturity6.4

Open-source/build8.4

Evidence8.0

Workflow potential9.6

Setup ease6.4

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for anyone curating a skills library and wanting evidence that a skill improves real outcomes before pushing it across a team or marketplace.

Who should use it

Teams maintaining internal or public skill librariesDevelopers comparing competing workflow packs before adopting oneEvaluation engineers who want deterministic graders instead of model-judged score inflationMarketplace operators trying to filter weak skill submissions early

Who should skip it

Move on from boheling/skillbench if the licensing terms, language support, or platform requirements do not fit your project.

About this signal

boheling/skillbench is tracked by RepoRadar as a benchmarking in the Developer Tools section. It was first seen on 2026-06-28 and last updated on 2026-06-28. The current verdict is 'try now' with a Gold tier and moderate setup difficulty. Across RepoRadar's eight signals, boheling/skillbench is strongest on workflow potential (9.6) and open-source/build quality (8.4) and weakest on momentum (5.0) — a profile worth weighing against your own priorities. This page summarizes the evidence RepoRadar has captured from captured source metadata. The score, tier, risk label, and verdict on this page are never influenced by sponsorship, ads, or tips — they reflect only the usefulness, popularity, novelty, momentum, maturity, and evidence signals described in the RepoRadar methodology.

How this item is evaluated

RepoRadar assigned boheling/skillbench a composite score of 8.1 out of 10, placing it in the Gold tier. This score combines weighted sub-signals: usefulness (35%), novelty (18%), momentum (14%), maturity (10%), open-source/build quality (7%), evidence quality (6%), workflow potential (6%), and setup ease (4%). Popularity is tracked separately at 1.0 and never affects the composite score or tier. The risk label of 'conditional' reflects inherent user-impacting hazards, not generic novelty. Items with no risk flag may still require normal code review before production use.

Putting this into practice? Read How to read AI benchmarks without getting fooled for the checklist behind this score.

Risk explanation

Benchmarking third-party skills can execute untrusted prompts, tools, or test fixtures, so keep runs inside a disposable environment; The included cases are a strong starting point but still narrower than most production workflows, so add your own house tasks before treating a score as a general verdict.

Evidence links

github.com

Closest alternatives / related signals

skillsbenchmarkingagent-evalspythondeveloper-toolsmit