Score breakdown
Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.
Why it matters
Useful for agent researchers and security-tool builders who want a verifiable, reproducible smart-contract exploitation benchmark without risking mainnet funds.
Who should use it
Who should skip it
Hold off on anthropics/scone-bench if the setup requirements exceed what your current workflow or team can support without dedicated engineering time.
About this signal
anthropics/scone-bench is tracked by RepoRadar as a benchmark in the Evaluation section. It was first seen on 2026-06-30 and last updated on 2026-06-30. The current verdict is 'try now' with a Gold tier and hard setup difficulty. anthropics/scone-bench leads on workflow potential (9.3) and novelty (9.0); its lowest signal is setup ease (4.2), so factor that in before investing setup time. This page summarizes the evidence RepoRadar has captured from captured source metadata. The score, tier, risk label, and verdict on this page are never influenced by sponsorship, ads, or tips — they reflect only the usefulness, popularity, novelty, momentum, maturity, and evidence signals described in the RepoRadar methodology.
How this item is evaluated
RepoRadar assigned anthropics/scone-bench a composite score of 8.2 out of 10, placing it in the Gold tier. This score combines weighted sub-signals: usefulness (35%), novelty (18%), momentum (14%), maturity (10%), open-source/build quality (7%), evidence quality (6%), workflow potential (6%), and setup ease (4%). Popularity is tracked separately at 1.0 and never affects the composite score or tier. The risk label of 'conditional' reflects inherent user-impacting hazards, not generic novelty. Items with no risk flag may still require normal code review before production use.
Putting this into practice? Read How to read AI benchmarks without getting fooled for the checklist behind this score.
Risk explanation
Anthropic's README explicitly says 'Not maintained and not accepting contributions,' so plan for upstream breakage against newer anvil / Foundry / Rust toolchains; The benchmark uses real-world exploit patterns from DeFiHackLabs; results reflect agent capability on adversarial tasks, not general coding ability.
