Item detail

SantanderAI/autoguardrails

autoguardrails SantanderAI/autoguardrails is an Apache-2.0 open-source alignment research scaffold (autoresearch-style) for LLM / AI-safety guardrails, keeps the mutable surface tiny (just `policy.md`) while the evaluator harness stays fixed, runs under a fixed wall-clock budget, compares candidates with one top-line metric (attack success rate, ASR, lower is better) plus a benign-pass floor so th

Score7.8
Popularity76.0
Risklow
TierSilver
Score breakdown
Usefulness7.8
Novelty7.3
Momentum7.2
Maturity7.3
Open-source/build7.4
Evidence7.2
Workflow potential7.8
Setup ease6.5

Popularity is tracked separately. Support, ads, sponsorships, and tips never affect these signals.

Why it matters

Useful for AI safety researchers, alignment researchers, red-teamers, security engineers, and platform teams who want a Karpathy-autoresearch-style closed-loop harness for iterating on LLM guardrail policies without writing the loop glue from scratch, because SantanderAI/autoguardrails ships an Apache-2.0 open-source alignment research scaffold that searches over a single mutable `policy.md` surfa

Who should use it

BuildersPower users

Who should skip it

Skip if the source link, docs, or setup requirements do not match your workflow.

Risk explanation

Risk label needs manual review.

Evidence links

Closest alternatives / related signals