Guide

A pre-flight checklist before you ship an AI feature

The prototype works, the demo dazzles the team, and the temptation is to ship. But an AI feature that's impressive in a notebook can be unreliable, expensive, or unsafe in production. Run this pre-flight checklist before you put a model in front of real users — it's the operational side of the same evidence-first thinking behind the radar.

1. Define what "wrong" looks like — and what it costs

AI features fail differently from ordinary code: they're confidently wrong rather than crashing. Before shipping, write down the failure modes that actually matter for your use case and what each one costs — a bad summary is an annoyance, a bad medical or financial answer is a liability. Your tolerance for error should set everything downstream, from how much you constrain the model to whether a human stays in the loop.

2. Put a human where the stakes are high

Full autonomy is appropriate for low-stakes, easily-reversible actions and reckless for high-stakes irreversible ones. Decide deliberately where the model acts on its own and where it only proposes while a person approves. The cost of a review step is friction; the cost of skipping it on a consequential action is whatever the model can break unsupervised. Match the autonomy to the blast radius.

3. Budget for tokens and latency at real scale

Costs that are invisible at demo volume become the whole conversation at production volume. Estimate tokens per request times realistic traffic, and check tail latency, not just the happy-path average — a feature that's fast in testing can stall under load or on long inputs. Know your per-request cost and your p95 latency before launch, because discovering them from a bill or an outage is the expensive way.

4. Harden against bad and hostile input

Users will paste enormous inputs, empty inputs, and inputs designed to jailbreak your prompt. If the feature ingests untrusted content, prompt injection is a when, not an if. Decide what happens at the size limits, sanitize what reaches the model, and never let model output trigger sensitive actions without validation. Treat the model's input and output as untrusted boundaries, the same way you'd treat any external data.

5. Make it observable and reversible

You cannot fix what you cannot see. Log inputs, outputs, and the model/version behind each response so you can debug a complaint and detect quality drift when a provider updates a model under you. Pair that with a kill switch or feature flag: when something goes wrong — and with probabilistic systems it eventually will — you want to roll back in seconds, not redeploy in hours.

6. Plan for the model changing beneath you

Hosted models are moving targets: the endpoint you tested can behave differently next month, and a model you depend on can be deprecated. Pin versions where you can, keep your own eval set to catch regressions, and avoid wiring your product so tightly to one model that swapping it is a rewrite. Shipping an AI feature isn't a one-time launch — it's a system you'll keep re-validating as the ground shifts.

RepoRadar scores maturity, evidence, and risk separately so you can judge production-readiness, not just novelty. Browse the full radar or read how to vet an AI agent or MCP server.
Advertisement