Guide

Does AI train on your data? Privacy and data controls explained

"If I paste this into an AI, who sees it — and will it train on my data?" is a fair question and a confusing one, because the answer depends entirely on which product and which plan you're using. Here's how to reason about AI data privacy instead of guessing, whether you're an individual or wiring AI into a company.

The default to assume

Until you've checked, assume anything you send to a hosted AI tool leaves your device, is processed on someone else's servers, and may be retained for some period. That's not sinister — it's how cloud software works — but it means the same caution you'd apply to any third-party service applies here. The variable that matters most is whether your inputs are used to train future models, which is a different question from whether they're merely processed.

Training use varies by product and plan

Many consumer chat products may use your conversations to improve their models by default, usually with an opt-out setting. Most business, team, and API tiers contractually do not train on your data by default. The practical takeaway: free/consumer tiers are the ones to read carefully, and you should look for the specific "data used for training" toggle or clause rather than assuming. When in doubt, opt out and use a paid/business tier for anything sensitive.

Never paste what you can't afford to expose

Regardless of policy, don't paste secrets you wouldn't put in any third-party tool: passwords, API keys, customer PII, regulated health/financial records, or confidential code under restrictive terms. Policies change, accounts get breached, and logs exist. Redact identifiers, use synthetic examples, or keep sensitive work on a tool you've vetted for that purpose. The safest data is the data you never sent.

Read the three things that matter

You don't need to read the whole policy — find three answers: (1) Is my input used for training, and can I turn it off? (2) How long is data retained, and can I delete it? (3) Who can access it (staff, subprocessors, law enforcement)? A provider that answers these clearly is a better bet than one that buries them. Enterprise agreements (DPAs) spell this out; consumer terms often don't.

Uploads, images, and chat history count too

Data privacy isn't only about typed text. Files you upload, images you submit, voice you record, and your saved chat history are all inputs a provider stores and may use under the same policy — sometimes governed by separate settings. If a tool offers "memory" or keeps a history, that's your data at rest on someone else's server. Check whether you can disable history, export it, and delete it, and treat an uploaded document with the same caution you'd give anything you paste into the box.

When privacy is non-negotiable, change where inference happens

If data genuinely can't leave your boundary — regulated industries, trade secrets — the answer isn't a better policy, it's self-hosting. Running an open model on your own infrastructure keeps inputs inside your perimeter, trading some capability and convenience for control. For most teams a vetted business-tier API is enough; for the strictest cases, local or private deployment is the real privacy guarantee.

RepoRadar flags tools that touch sensitive data, accounts, or credentials, and tracks local-inference options for privacy-first stacks. Browse the full radar or read local AI vs. hosted APIs.