Score breakdown
Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.
Why it matters
Useful for teams that need to test whole agent trajectories instead of only grading final answers, especially when tool calls and policy boundaries are the real failure surface.
Who should use it
Who should skip it
Skip YutoTerashima/agent-safety-eval-lab unless the captured evidence suggests it solves a problem you are actively working on.
About this signal
YutoTerashima/agent-safety-eval-lab is tracked by RepoRadar as a evaluation lab in the Developer Tools section. It was first seen on 2026-06-28 and last updated on 2026-06-28. The current verdict is 'try now' with a Gold tier and moderate setup difficulty. YutoTerashima/agent-safety-eval-lab leads on workflow potential (9.8) and open-source/build quality (8.4); its lowest signal is momentum (6.0), so factor that in before investing setup time. This page summarizes the evidence RepoRadar has captured from captured source metadata. The score, tier, risk label, and verdict on this page are never influenced by sponsorship, ads, or tips — they reflect only the usefulness, popularity, novelty, momentum, maturity, and evidence signals described in the RepoRadar methodology.
How this item is evaluated
RepoRadar assigned YutoTerashima/agent-safety-eval-lab a composite score of 8.3 out of 10, placing it in the Gold tier. This score combines weighted sub-signals: usefulness (35%), novelty (18%), momentum (14%), maturity (10%), open-source/build quality (7%), evidence quality (6%), workflow potential (6%), and setup ease (4%). Popularity is tracked separately at 1.0 and never affects the composite score or tier. The risk label of 'conditional' reflects inherent user-impacting hazards, not generic novelty. Items with no risk flag may still require normal code review before production use.
Putting this into practice? Read How to vet an AI agent or MCP server before you wire it in for the checklist behind this score.
Risk explanation
The default mock mode is safe for local evaluation, but real adapters and trace imports can pull sensitive prompts or tool logs into the lab, so scrub production data before replaying it; It is an evaluation harness rather than a plug-and-play guardrail, so you still need your own policy thresholds before wiring results into shipping decisions.
