open-agent-ai-security/praxen: AI tool review & score

Score8.1

Popularity1.0

Riskconditional

TierGold

Score breakdown

Usefulness8.0

Novelty8.0

Momentum6.0

Maturity6.4

Open-source/build8.4

Evidence7.2

Workflow potential9.2

Setup ease6.4

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for agent builders and security-minded teams that want a repeatable way to test whether an agent actually stayed inside its stated remit, not just whether it produced a plausible answer.

Who should use it

Agent-platform teams validating policy adherence in coding-agent workflowsSecurity and governance teams reviewing whether agents stayed inside their allowed remitBuilders who want a concrete verifier instead of ad hoc post-run inspectionResearchers comparing evidence-based agent evaluation methods

Who should skip it

Pass on open-agent-ai-security/praxen if its scope or audience does not match what your team is building right now.

About this signal

open-agent-ai-security/praxen is tracked by RepoRadar as a developer tool in the Evaluation / Safety section. It was first seen on 2026-07-01 and last updated on 2026-07-01. The current verdict is 'try now' with a Gold tier and moderate setup difficulty. open-agent-ai-security/praxen leads on workflow potential (9.2) and open-source/build quality (8.4); its lowest signal is momentum (6.0), so factor that in before investing setup time. This page summarizes the evidence RepoRadar has captured from captured source metadata. The score, tier, risk label, and verdict on this page are never influenced by sponsorship, ads, or tips — they reflect only the usefulness, popularity, novelty, momentum, maturity, and evidence signals described in the RepoRadar methodology.

How this item is evaluated

RepoRadar assigned open-agent-ai-security/praxen a composite score of 8.1 out of 10, placing it in the Gold tier. This score combines weighted sub-signals: usefulness (35%), novelty (18%), momentum (14%), maturity (10%), open-source/build quality (7%), evidence quality (6%), workflow potential (6%), and setup ease (4%). Popularity is tracked separately at 1.0 and never affects the composite score or tier. The risk label of 'conditional' reflects inherent user-impacting hazards, not generic novelty. Items with no risk flag may still require normal code review before production use.

Putting this into practice? Read How to vet an AI agent or MCP server before you wire it in for the checklist behind this score.

Risk explanation

It works by inspecting agent traces, artifacts, and declared policy documents, so first evaluation should use non-sensitive sessions until you are comfortable with what evidence gets surfaced; A verifier can only judge against the remit you define and the evidence you retain, so weak logging or vague policy statements will limit the quality of the result.

Evidence links

github.com

Closest alternatives / related signals

agent-evalsverificationsecuritygovernanceclaude-codecodexapache-2.0