securityAIRAGOWASP

How Do You Prevent PII and Data Leakage in LLM and RAG Applications?

Data leakage in LLM and RAG apps happens at three points: input, retrieval, and output. Here is a concrete defense stack for PII, cross-user RAG leakage, and system prompt exposure, mapped to OWASP LLM02.

June 27, 202612 min readELM Labs

TL;DR

LLM data leakage is OWASP LLM02 Sensitive Information Disclosure: PII, proprietary data, and business secrets escaping through model inputs, retrieval, or outputs.
Redact PII before the call with a tool like Presidio, but treat it as one layer, since automated detection cannot guarantee it catches everything.
Cross-user RAG leakage is an access-control failure: store permissions on every chunk and enforce them at retrieval time, never after generation.
The system prompt is not a secret store; assume it leaks, keep secrets in code and tools, and screen outputs before they reach the user.

What is LLM data leakage and why does it happen?

LLM data leakage is when sensitive information, such as personal data, proprietary content, or confidential business records, escapes through an AI application in a way it should not. In the OWASP Top 10 for LLM Applications 2025 it is ranked second, as LLM02:2025 Sensitive Information Disclosure, directly below prompt injection (OWASP, 2025). The categories it covers are personal identifiable information and health and financial records, proprietary model and source data, and confidential corporate and legal documents (OWASP LLM02:2025).

It happens because an LLM application is a pipeline with three distinct exposure points, and each one leaks differently:

The input side. Whatever a user or your own code sends to the model leaves your boundary. If a support ticket containing a customer's full name, email, and card number is pasted into the prompt, that data is now in a third-party API call and possibly in provider logs.
The retrieval side. In a RAG system, the retriever pulls documents from your knowledge base and injects them into the prompt. If retrieval is not scoped to the asking user's permissions, one person can receive content drawn from documents they were never allowed to see.
The output side. The model can reproduce data it was given, infer sensitive details, or be coaxed into revealing its own system prompt and the instructions or context inside it.

The mistake most teams make is treating leakage as a single problem with a single fix, usually a PII filter. It is three problems. The rest of this guide walks each one and the OWASP-aligned control that addresses it. The honest framing first: no single layer is sufficient, so the goal is defense in depth where any one failure is contained by the next.

How do you redact PII before sending it to an LLM?

You redact PII by running every prompt through a detection and anonymization step before the model call, replacing or masking entities like names, emails, phone numbers, and account identifiers. The most common open-source option is Microsoft Presidio, which identifies and anonymizes PII in structured and unstructured text using a mix of named-entity recognition, regular expressions, and checksum validation, and lets you replace, mask, hash, or encrypt each detected entity.

The critical caveat is in Presidio's own documentation: because it relies on automated detection, "there is no guarantee that Presidio will find all sensitive information" (Microsoft Presidio). That is true of every pattern-based redactor. A name in an unusual format, an identifier the model was never trained on, or PII embedded in a screenshot passed through OCR can all slip past. OWASP lists input sanitization and pattern-matching redaction as controls precisely because they reduce exposure, not because they eliminate it (OWASP LLM02:2025).

So treat redaction as a probabilistic filter, not a guarantee, and pair it with two things:

Tokenization for data you must preserve. When the workflow genuinely needs an identifier (an order number, an account reference), swap the real value for a token before the call and detokenize after, so the sensitive value never reaches the model. OWASP names tokenization as a core preprocessing control.
A data-minimization rule upstream. The cheapest leak to prevent is the one you never send. Before reaching for a redactor, ask whether the field needs to be in the prompt at all. Structured lookups, status flags, and IDs often belong in your own code, not in the model's context.

Does my RAG application leak data between users?

It can, and this is the leakage mode teams underestimate most. Cross-user leakage in RAG is not a model problem; it is an access-control problem. If your retriever searches the entire vector index and returns the closest matches regardless of who is asking, then a query from a junior employee can surface a chunk from a board memo, an HR file, or another customer's record, simply because it was semantically relevant.

The fix is to attach permissions to the data and enforce them at retrieval time, not after. The OWASP RAG Security Cheat Sheet is explicit: store access-control metadata such as classification, owner, permitted roles, and permitted tenants alongside every vector chunk, not just on the source document, and enforce those checks at retrieval time rather than only at ingestion (OWASP RAG Security Cheat Sheet). Concretely, that means:

Filter before you generate. The retriever should only ever return chunks the current user is authorized to see, applied as a metadata filter in the vector query, so unauthorized content never enters the prompt. Filtering the model's answer afterward is too late; the data was already in the context.
Tag chunks, not just documents. A single document can contain mixed sensitivity. Permissions live on the chunk so a public section of a partly confidential file can be retrieved without exposing the rest.
Fail closed. If the permission check or any pipeline component errors, deny the request rather than falling back to returning everything (OWASP RAG Security Cheat Sheet).
Lock down the index itself. Restrict write access to the vector index to authorized ingestion pipelines only; no application endpoint or agent should write to it directly, which prevents poisoned or mislabeled chunks from entering.

For the broader picture of how retrieval failures and access control fit into a production RAG design, see our explainer on how RAG systems actually work.

How do you prevent system prompt leakage?

You prevent system prompt damage by assuming the prompt will leak and keeping nothing sensitive in it. System Prompt Leakage is its own entry in the 2025 OWASP list, LLM07:2025, which exists because teams kept storing secrets, credentials, and confidential logic in the system prompt and then discovered it could be extracted (OWASP, 2025). OWASP's guidance on sensitive disclosure makes the same point about defense: system prompt restrictions alone can be bypassed through prompt injection, so they cannot be your only control (OWASP LLM02:2025).

The practical rules:

Never put secrets in the prompt. API keys, database credentials, internal URLs, and connection strings belong in your application code and environment, retrieved by tools the model calls, never written into the text the model reads. If the prompt leaks, an attacker should learn only that you have a polite assistant, not how to reach your database.
Keep authorization out of the prompt. Telling the model "only admins may see this" is not access control; it is a suggestion. Enforce permissions in code, as in the retrieval section above, so leaking the instruction reveals no path to the data.
Treat the system prompt as public. Write it as if a competitor will read it, because they might. The proprietary value should be in your data, your tools, and your engineering, not in a paragraph the model can be talked into reciting.

What are the OWASP-aligned controls for sensitive information disclosure?

OWASP LLM02:2025 groups its prevention controls into layers, and reading them together gives you the defense stack rather than a list of disconnected tips (OWASP LLM02:2025):

Data sanitization and minimization. Input sanitization, pattern-matching redaction, and tokenization to strip or mask sensitive values before they enter the model, plus not collecting what you do not need.
Access controls. Least-privilege access and restricting the application's connections to external data sources, so the model and its tools can reach only what the task requires.
System prompt protection. Concealing the system prompt and never relying on it to enforce restrictions on its own.
Advanced safeguards for high-sensitivity workloads. Techniques such as homomorphic encryption and differential privacy where the data demands it.

This maps cleanly onto the three exposure points from the first section: sanitization and minimization defend the input side, least-privilege access defends the retrieval side, and prompt protection plus output handling defend the output side. NIST's adversarial machine learning taxonomy reinforces why output-side defenses matter: it catalogs privacy attacks, including data extraction and inference techniques that pull training or context data back out of a model, as a distinct attack class with its own mitigations (NIST AI 100-2e2025). A redactor on the input does nothing against an extraction attack on the output, which is why the stack has to cover all three points.

For agent-based systems, where the model can call tools and take actions, the blast radius of a leak is larger and least privilege becomes the central control; we cover that in securing AI agents against tool poisoning and excessive agency.

How do you stop output-side leakage and verify your defenses?

You stop output-side leakage by screening what the model produces before it reaches the user, and by structuring the application so that untrusted content cannot rewrite your instructions. The clearest concrete patterns come from Anthropic's guidance on guardrails (Anthropic, 2026):

Put untrusted content only in tool results, and JSON-encode it. Deliver retrieved documents, emails, and web content inside tool_result blocks rather than concatenating them into your prompt, and wrap them as JSON so an attacker cannot close a quote or tag to break out into an instruction context. This is what stops an indirect prompt injection, where a poisoned document tells the model to dump its context or call a tool to exfiltrate data, from succeeding.
Screen outputs with a lightweight classifier. Run the model's response, or a tool's output before the model acts on it, through a small fast model that returns a structured yes/no on whether sensitive data or an injection attempt is present, and block or strip it if so. This catches leaks that input redaction missed.
Apply least privilege to tools. Give the model access only to the data and actions the task needs, run tools in sandboxed environments, and scope permissions narrowly, so that even a successful injection has little to leak or do.

Verification is the step teams skip. Anthropic's guidance and OWASP both point to the same discipline: red-team your own application before launch with documents, queries, and tool outputs that deliberately try to extract data or override instructions, and confirm your redaction, access filters, and screening actually catch them (Anthropic, 2026). A defense you have not tested against a real attempt is an assumption, not a control. Because prompt injection cannot be fully solved, that testing is how you confirm your blast-radius design holds; see why prompt injection is not fully solvable for the underlying reason.

FAQ

What is the difference between PII redaction and access control in an LLM app?

Redaction removes sensitive values from text before it reaches the model, so a card number or email is masked in the prompt. Access control decides which data a given user is allowed to retrieve in the first place. They solve different problems: redaction stops sensitive fields leaking through the input and output, while access control stops one user receiving another user's documents in a RAG system. You need both, because a perfectly redacted prompt can still return a confidential file the user was never authorized to see.

Can Microsoft Presidio guarantee it catches all PII before an LLM call?

No. Presidio's own documentation states that because it uses automated detection, there is no guarantee it will find all sensitive information. It is an effective layer that catches the large majority of common entities, but unusual formats, novel identifiers, and PII inside images or OCR text can slip past. Treat it as one part of a defense stack alongside data minimization, tokenization, and output screening, not as a complete solution.

Why is the system prompt not a safe place to store secrets?

Because the system prompt can be extracted. System Prompt Leakage is its own risk in the OWASP Top 10 for LLM Applications 2025 (LLM07), and OWASP notes that system prompt restrictions can be bypassed through prompt injection. Anything in the prompt, including API keys, credentials, or internal logic, should be assumed reachable by a determined user. Keep secrets in your application code and environment, retrieved through tools, and write the prompt as if it were public.

How does data leak between users in a RAG system?

It leaks when the retriever searches the whole knowledge base and returns the most semantically relevant chunks without checking whether the current user is allowed to see them. A query can then surface content from a document that belongs to another team, another customer, or a higher clearance level. The fix is to store access-control metadata on every chunk and apply it as a filter at retrieval time, so unauthorized content never enters the prompt, rather than trying to filter the answer afterward.

Is sending sensitive data to a hosted LLM safe for business use?

It can be, with the right controls and a provider whose data-handling terms you have reviewed. The risks are that data sent in a prompt leaves your boundary and may be logged, and that the model could reproduce it later. Mitigate by minimizing what you send, redacting or tokenizing PII before the call, using enterprise terms that exclude your data from training and limit retention, and, for the most sensitive workloads, running open-source models in your own infrastructure so no data leaves your servers.

Which OWASP risk covers LLM data leakage?

The primary one is LLM02:2025 Sensitive Information Disclosure, which covers PII, proprietary data, and confidential business information escaping through an LLM application. Two related entries also matter: LLM07:2025 System Prompt Leakage, for exposure of the instructions and any secrets in the system prompt, and LLM08:2025 Vector and Embedding Weaknesses, for leakage through the retrieval and embedding layer of a RAG system.

ELM Labs is an applied AI lab that designs and builds secure LLM and RAG systems end to end.

Have a project in mind?

Tell us what you're building and we'll see if we can help.

Share your project