Prompt Injection Attack

Prompt Injection Attack

Table of Contents

Quick Answer

Prompt injection happens when untrusted text influences an LLM application in a way that conflicts with its intended instructions or workflow. Direct prompt injection comes from the user prompt, while indirect prompt injection can come from external content such as documents, emails, pages, or retrieved context.

What is Prompt Injection?

Prompt injection happens when untrusted text influences an LLM application in a way that conflicts with its intended instructions, policy, or workflow. The risk is not only the text typed by a user; it can also come from files, emails, web pages, search results, or retrieved documents that the model reads.

Direct vs Indirect Prompt Injection

TypeWhere it comes fromSafe defensive idea
Direct prompt injectionThe user message or promptValidate requests, enforce policy outside the model, and limit tool permissions
Indirect prompt injectionExternal content read by the modelTreat retrieved documents, emails, and web pages as untrusted data

Why Prompt Injection is Different from Normal User Input

Traditional input validation often checks format, length, and allowed values. LLM applications also need to decide which text is an instruction, which text is data, and which actions require independent authorization. A model should not be the only control deciding whether a sensitive action is allowed.

Safe Conceptual Examples

Imagine an AI assistant summarizing a user-uploaded document. The document contains a sentence telling the assistant to ignore its previous instructions and reveal confidential data. A secure design treats that sentence as untrusted document content, not as a trusted instruction.

Another example is an AI agent reading a web page before taking an action. If the page contains text that tries to influence the agent, the application should validate the intended action, confirm permissions, and require human approval for sensitive changes.

Impact on AI Applications

  • Unsafe tool calls or workflow actions.
  • Exposure of sensitive prompt context or business data.
  • Incorrect decisions based on manipulated external content.
  • Unsafe output copied into code, HTML, SQL, or operational systems.

Prompt Injection vs Jailbreaking

Jailbreaking usually refers to attempts to get a general AI model to ignore safety restrictions. Prompt injection focuses on how untrusted input can interfere with a specific LLM application, especially when the application is connected to tools, data, or workflows.

Prevention Checklist

  • Separate system instructions from user and document content.
  • Keep authorization and business rules outside the model.
  • Validate and sanitize model outputs before using them in other systems.
  • Limit tools, plugins, and agents with least privilege.
  • Require human approval for sensitive or irreversible actions.
  • Log tool calls, approvals, and unusual model behavior for review.

Developer Controls

Developers should design LLM applications like security-sensitive software. Use allowlists for actions, validate structured outputs, isolate retrieved content, rate-limit expensive operations, and review agent permissions regularly. The broader AI Security Roadmap explains how this fits into the larger learning path.

FAQs

Prompt injection is when untrusted text tries to influence an LLM application to behave differently from its intended instructions, rules, or workflow.

Direct prompt injection comes from a user prompt. Indirect prompt injection comes from external content such as a document, email, web page, or retrieved context that the model reads.

No. Jailbreaking usually tries to bypass a general model safety behavior. Prompt injection focuses on how untrusted text can interfere with a specific LLM application or workflow.

Reduce risk by separating instructions from data, enforcing authorization outside the model, validating outputs, limiting tool permissions, and requiring human approval for sensitive actions.

It can contribute to sensitive data exposure if an application gives the model access to secrets, private documents, or tools without strong access control and output validation.

Sources and further reading