Guide

How to vet an AI agent or MCP server before you wire it in

Agents and MCP servers are uniquely useful and uniquely risky: their whole point is to act on your behalf — read files, call tools, hit APIs, sometimes execute code — which means a careless one can do real damage faster than a passive library ever could. Before you wire one into anything that matters, run it through these checks. They're the same trust questions RepoRadar weighs when it flags risk on an autonomous tool.

1. Enumerate what it can actually touch

Before trusting an agent, list every capability it requests: filesystem paths, network egress, shell or code execution, credentials, and which external services it can call. An MCP server is just a set of tools exposed to a model — read the tool definitions and understand the blast radius of each. If a "summarize my notes" agent asks for shell access and outbound network, that mismatch is your answer. Capabilities should map tightly to the job; anything beyond that is attack surface you're adopting for free.

2. Treat prompt injection as a given, not an edge case

Any agent that reads untrusted content — web pages, emails, issues, documents — can be steered by instructions hidden in that content. This isn't hypothetical; it's the default failure mode of tool-using models. Ask what the agent does when the data it reads tries to hijack it: does a malicious page get to trigger the agent's tools? The safer designs keep a hard boundary between "content to reason about" and "instructions to obey," and never let the former silently become the latter.

3. Demand a permission boundary you control

The difference between a useful agent and a liability is usually whether you decide what it's allowed to do. Look for allowlists, per-tool approval, read-only modes, and the ability to scope credentials down to exactly what the task needs. An agent that runs fully autonomous with broad permissions and no confirmation step is fine for a sandbox and dangerous against production data. Least privilege isn't a nice-to-have here; it's the whole safety model.

4. Check how it handles secrets

Agents need credentials to be useful, which makes secret handling a first-class evaluation criterion. Where do API keys live — environment, config file, a vault? Are they ever logged, echoed into transcripts, or sent to a model provider as context? A tool that pastes your keys into a prompt it ships off-box has effectively exfiltrated them. Trace the path of a credential from where you set it to everywhere it could end up before you hand over anything sensitive.

5. Insist on observability

You can't trust what you can't see. A production-worthy agent leaves a legible trail: which tools it called, with what arguments, and why. If the only output is a final answer with the reasoning and tool calls hidden, you have no way to catch a quiet mistake or a manipulated action until it's already happened. Prefer agents that log their steps and let you replay a run — observability is what turns "it did something weird" into a fixable bug report.

6. Sandbox first, widen slowly

Run any new agent or MCP server in an isolated environment before it gets near real data or accounts — a throwaway container, scoped test credentials, no production network. Watch what it actually does versus what it claims. Only after it behaves predictably under your control should you widen its access, one capability at a time. The cost of this caution is an afternoon; the cost of skipping it is whatever the tool can reach when it goes wrong.

RepoRadar scores agent frameworks, MCP servers, and autonomous tools with risk and permission concerns called out separately from usefulness. Browse the full radar or read the methodology.
Advertisement