Score breakdown
Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.
Why it matters
Useful for **Agent Skills authors who want receipts** — `agent-skills-eval` runs the same prompt twice (with_skill vs without_skill), has a judge model grade both, and produces a side-by-side HTML report so the skill author can prove the SKILL.md actually improves the model's performance rather than just adding noise. Useful for **Claude Code / Codex / OpenClaw / Hermes Agent skill library maintai
Who should use it
Who should skip it
Skip darkrishabh/agent-skills-eval unless the captured evidence suggests it solves a problem you are actively working on.
About this signal
darkrishabh/agent-skills-eval is tracked by RepoRadar as a mit test runner for anthropic ag in the darkrishabh/agent-skills-eval is the MIT `agent- section. It was first seen on 2026-06-25 and last updated on 2026-06-25. The current verdict is 'try now' with a Gold tier and easy setup difficulty. The standout signals for darkrishabh/agent-skills-eval are workflow potential (9.9) and maturity (9.1), while evidence quality (8.0) trails — that balance shapes where it fits best. This page summarizes the evidence RepoRadar has captured from captured source metadata. The score, tier, risk label, and verdict on this page are never influenced by sponsorship, ads, or tips — they reflect only the usefulness, popularity, novelty, momentum, maturity, and evidence signals described in the RepoRadar methodology.
How this item is evaluated
RepoRadar assigned darkrishabh/agent-skills-eval a composite score of 8.4 out of 10, placing it in the Gold tier. This score combines weighted sub-signals: usefulness (35%), novelty (18%), momentum (14%), maturity (10%), open-source/build quality (7%), evidence quality (6%), workflow potential (6%), and setup ease (4%). Popularity is tracked separately at 602.0 and never affects the composite score or tier. The risk label of 'none' reflects inherent user-impacting hazards, not generic novelty. Items with no risk flag may still require normal code review before production use.
Putting this into practice? Read How to vet an AI agent or MCP server before you wire it in for the checklist behind this score.
Risk explanation
No inherent user-impacting risk is flagged from the captured evidence.
