Skip to main content

Prompt Injection

Prompt Injection Attack

Table of Contents

Quick Answer

Prompt injection happens when untrusted text influences an LLM application in a way that conflicts with its intended instructions or workflow. Direct prompt injection comes from the user prompt, while indirect prompt injection can come from external content such as documents, emails, pages, tickets, or retrieved context.

What is Prompt Injection?

Prompt injection is an AI application security risk where untrusted text affects how an LLM follows instructions, summarizes content, calls tools, or produces output. The risk appears when an application mixes trusted instructions with untrusted content and then relies on the model to make security-sensitive decisions.

Direct vs Indirect Prompt Injection

TypeSourceDefensive idea
DirectUser message or chat promptValidate requests and enforce policy outside the model
IndirectDocuments, emails, websites, tickets, retrieved context, tool responsesTreat external text as untrusted data, not trusted instructions

Prompt Injection vs SQL Injection

AreaSQL InjectionPrompt Injection
Input typeDatabase query inputNatural-language text and retrieved context
Execution boundarySQL engineLLM application, tools, prompts, and workflows
Defense styleParameterized queries and input validationInstruction/data separation, tool limits, output validation, approvals
Residual riskUsually reduced strongly with correct coding patternsRequires layered controls because natural language remains ambiguous

Common Prompt Injection Scenarios

  • An assistant summarizes an uploaded document that contains instructions intended for the model rather than the user.
  • A RAG system retrieves a web page or note that contains untrusted text mixed with legitimate content.
  • An AI agent reads a ticket, email, or web page before calling tools or drafting a response.
  • A model output is copied into HTML, code, SQL, or another system without validation.

Prompt Injection in RAG and AI Agents

Prompt injection becomes more serious when the model can retrieve documents or call tools. In RAG systems, untrusted content can enter the prompt through retrieval. In AI agents, manipulated context may influence tool selection, action plans, or external messages. The application must validate actions and permissions outside the model.

Defensive Design Principles

  • Keep system instructions and business rules separate from untrusted content.
  • Never use the model as the only authorization control.
  • Give tools and agents the minimum permission needed.
  • Validate model output before it is executed, rendered, stored, or sent externally.
  • Use human approval for sensitive, irreversible, or high-impact actions.
  • Log tool calls, retrieval sources, approvals, and unusual behavior.

Developer Checklist

ControlBeginner check
Instruction separationCan untrusted text override system or developer instructions?
Tool authorizationAre tool calls checked by normal application permissions?
Output handlingIs model output validated before use in code, HTML, SQL, files, or APIs?
Human approvalAre sensitive actions reviewed before execution?
MonitoringAre risky prompts, retrieval sources, and tool calls logged safely?

Safe Testing Boundaries

Use safe toy examples, lab apps, or systems where you have permission. Do not test prompt injection against real products, accounts, or private data without authorization. This page explains defensive concepts and intentionally avoids exploit-ready prompt libraries.

Explore AI Security Topics

FAQs

Prompt injection is when untrusted text influences an LLM application in a way that conflicts with intended instructions, workflow, or policy.

Indirect prompt injection comes from external content such as documents, emails, web pages, search results, or retrieved context that an LLM application reads.

No. Jailbreaking usually targets a general model safety boundary, while prompt injection targets a specific LLM application and its connected data, tools, or workflow.

Developers reduce risk with instruction/data separation, tool permission limits, external authorization checks, output validation, logging, and human approval for sensitive actions.

It is better treated as a risk to reduce with layered controls. Natural language is ambiguous, so applications need defense-in-depth rather than one perfect filter.

Sources and further reading