Prompt Injection

Quick Answer

Prompt injection happens when untrusted text influences an LLM application in a way that conflicts with its intended instructions or workflow. Direct prompt injection comes from the user prompt, while indirect prompt injection can come from external content such as documents, emails, pages, tickets, or retrieved context.

What is Prompt Injection?

Prompt injection is an AI application security risk where untrusted text affects how an LLM follows instructions, summarizes content, calls tools, or produces output. The risk appears when an application mixes trusted instructions with untrusted content and then relies on the model to make security-sensitive decisions.

Direct vs Indirect Prompt Injection

Type	Source	Defensive idea
Direct	User message or chat prompt	Validate requests and enforce policy outside the model
Indirect	Documents, emails, websites, tickets, retrieved context, tool responses	Treat external text as untrusted data, not trusted instructions

Prompt Injection vs SQL Injection

Area	SQL Injection	Prompt Injection
Input type	Database query input	Natural-language text and retrieved context
Execution boundary	SQL engine	LLM application, tools, prompts, and workflows
Defense style	Parameterized queries and input validation	Instruction/data separation, tool limits, output validation, approvals
Residual risk	Usually reduced strongly with correct coding patterns	Requires layered controls because natural language remains ambiguous

Common Prompt Injection Scenarios

An assistant summarizes an uploaded document that contains instructions intended for the model rather than the user.
A RAG system retrieves a web page or note that contains untrusted text mixed with legitimate content.
An AI agent reads a ticket, email, or web page before calling tools or drafting a response.
A model output is copied into HTML, code, SQL, or another system without validation.

Prompt Injection in RAG and AI Agents

Prompt injection becomes more serious when the model can retrieve documents or call tools. In RAG systems, untrusted content can enter the prompt through retrieval. In AI agents, manipulated context may influence tool selection, action plans, or external messages. The application must validate actions and permissions outside the model.

Defensive Design Principles

Keep system instructions and business rules separate from untrusted content.
Never use the model as the only authorization control.
Give tools and agents the minimum permission needed.
Validate model output before it is executed, rendered, stored, or sent externally.
Use human approval for sensitive, irreversible, or high-impact actions.
Log tool calls, retrieval sources, approvals, and unusual behavior.

Developer Checklist

Control	Beginner check
Instruction separation	Can untrusted text override system or developer instructions?
Tool authorization	Are tool calls checked by normal application permissions?
Output handling	Is model output validated before use in code, HTML, SQL, files, or APIs?
Human approval	Are sensitive actions reviewed before execution?
Monitoring	Are risky prompts, retrieval sources, and tool calls logged safely?

Safe Testing Boundaries

Use safe toy examples, lab apps, or systems where you have permission. Do not test prompt injection against real products, accounts, or private data without authorization. This page explains defensive concepts and intentionally avoids exploit-ready prompt libraries.

Agent, Tool, and MCP Injection Channels

Indirect instructions can reach an AI system through web pages, emails, RAG documents, tool descriptions, tool results, MCP resources, and stored memory. Treat each channel as untrusted data and keep authorization outside the model. Read MCP Tool Poisoning and the evidence-led guide to Indirect Prompt Injection in AI Agents.

Explore AI Security Topics

AI Security RoadmapStart with AI, LLM, prompt, RAG, agent, MCP, and defensive security concepts.LLM SecuritySecure LLM applications, prompts, outputs, tools, logs, and sensitive data flows.OWASP LLM Top 10Use the OWASP LLM Top 10 as a risk map for GenAI application security.RAG SecuritySecure retrieved documents, vector stores, permissions, provenance, and outputs.AI Agent SecurityControl agent identity, tools, memory, approvals, monitoring, and recovery.MCP SecuritySecure MCP hosts, clients, servers, authorization, tools, resources, and audits.

FAQs

Prompt injection is when untrusted text influences an LLM application in a way that conflicts with intended instructions, workflow, or policy.

Indirect prompt injection comes from external content such as documents, emails, web pages, search results, or retrieved context that an LLM application reads.

No. Jailbreaking usually targets a general model safety boundary, while prompt injection targets a specific LLM application and its connected data, tools, or workflow.

Developers reduce risk with instruction/data separation, tool permission limits, external authorization checks, output validation, logging, and human approval for sensitive actions.

It is better treated as a risk to reduce with layered controls. Natural language is ambiguous, so applications need defense-in-depth rather than one perfect filter.

Sources and further reading

OWASP Top 10 for Large Language Model Applications — LLM01 Prompt Injection and related LLM risks
OWASP LLM Prompt Injection Prevention Cheat Sheet — Prompt injection prevention guidance

Prompt Injection

Table of Contents

Quick Answer

What is Prompt Injection?

Direct vs Indirect Prompt Injection

Prompt Injection vs SQL Injection

Common Prompt Injection Scenarios

Prompt Injection in RAG and AI Agents

Defensive Design Principles

Developer Checklist

Safe Testing Boundaries

Agent, Tool, and MCP Injection Channels

Explore AI Security Topics

FAQs

What is prompt injection?

What is indirect prompt injection?

Is prompt injection the same as jailbreaking?

How do developers reduce prompt injection risk?

Can prompt injection be completely prevented?

Sources and further reading

Related Articles