Prompt Injection
Table of Contents
Quick Answer
Prompt injection happens when untrusted text influences an LLM application in a way that conflicts with its intended instructions or workflow. Direct prompt injection comes from the user prompt, while indirect prompt injection can come from external content such as documents, emails, pages, tickets, or retrieved context.
What is Prompt Injection?
Prompt injection is an AI application security risk where untrusted text affects how an LLM follows instructions, summarizes content, calls tools, or produces output. The risk appears when an application mixes trusted instructions with untrusted content and then relies on the model to make security-sensitive decisions.
Direct vs Indirect Prompt Injection
| Type | Source | Defensive idea |
|---|---|---|
| Direct | User message or chat prompt | Validate requests and enforce policy outside the model |
| Indirect | Documents, emails, websites, tickets, retrieved context, tool responses | Treat external text as untrusted data, not trusted instructions |
Prompt Injection vs SQL Injection
| Area | SQL Injection | Prompt Injection |
|---|---|---|
| Input type | Database query input | Natural-language text and retrieved context |
| Execution boundary | SQL engine | LLM application, tools, prompts, and workflows |
| Defense style | Parameterized queries and input validation | Instruction/data separation, tool limits, output validation, approvals |
| Residual risk | Usually reduced strongly with correct coding patterns | Requires layered controls because natural language remains ambiguous |
Common Prompt Injection Scenarios
- An assistant summarizes an uploaded document that contains instructions intended for the model rather than the user.
- A RAG system retrieves a web page or note that contains untrusted text mixed with legitimate content.
- An AI agent reads a ticket, email, or web page before calling tools or drafting a response.
- A model output is copied into HTML, code, SQL, or another system without validation.
Prompt Injection in RAG and AI Agents
Prompt injection becomes more serious when the model can retrieve documents or call tools. In RAG systems, untrusted content can enter the prompt through retrieval. In AI agents, manipulated context may influence tool selection, action plans, or external messages. The application must validate actions and permissions outside the model.
Defensive Design Principles
- Keep system instructions and business rules separate from untrusted content.
- Never use the model as the only authorization control.
- Give tools and agents the minimum permission needed.
- Validate model output before it is executed, rendered, stored, or sent externally.
- Use human approval for sensitive, irreversible, or high-impact actions.
- Log tool calls, retrieval sources, approvals, and unusual behavior.
Developer Checklist
| Control | Beginner check |
|---|---|
| Instruction separation | Can untrusted text override system or developer instructions? |
| Tool authorization | Are tool calls checked by normal application permissions? |
| Output handling | Is model output validated before use in code, HTML, SQL, files, or APIs? |
| Human approval | Are sensitive actions reviewed before execution? |
| Monitoring | Are risky prompts, retrieval sources, and tool calls logged safely? |
Safe Testing Boundaries
Use safe toy examples, lab apps, or systems where you have permission. Do not test prompt injection against real products, accounts, or private data without authorization. This page explains defensive concepts and intentionally avoids exploit-ready prompt libraries.
Explore AI Security Topics
FAQs
Sources and further reading
- OWASP Top 10 for Large Language Model Applications — LLM01 Prompt Injection and related LLM risks
- OWASP LLM Prompt Injection Prevention Cheat Sheet — Prompt injection prevention guidance