AI Agent Security

Quick Answer

AI agent security protects systems that can plan, call tools, use memory, access data, communicate and take actions. The safest architecture treats model output as untrusted, enforces identity and authorization outside the model, validates tool arguments, limits memory and permissions, requires approval for high-impact actions, and provides monitoring, revocation and rollback.

What Is AI Agent Security?

AI agent security focuses on applications where a model can do more than answer a question. An agent may browse data, call APIs, write files, send messages, create tickets, update records, coordinate with other agents, or continue a task over time. The more operational authority an agent receives, the more the surrounding application must control identity, permissions, context, execution and recovery.

Security should not depend on a prompt asking the model to behave safely. Prompts guide behavior; deterministic controls decide what is allowed.

AI Agent Security Architecture

Layer	Security responsibility
Model	Produce suggestions, not trusted commands
Orchestrator	Validate intent and apply policy
Tool gateway	Enforce identity, scope, rate limits and argument rules
API or service	Recheck user and resource authorization
Approval layer	Confirm sensitive, irreversible or externally visible actions
Audit layer	Record requested, approved and executed actions

AI Agent Identity and Authorization

Separate the human user's identity, the agent identity, and the backend service identity. A user may delegate a narrow task without granting the agent every permission the backend service possesses. Use short-lived credentials, explicit scopes, tenant isolation, resource-level authorization and clear records of delegated authority.

NIST's current agent standards work explicitly includes agent authentication and identity infrastructure, reflecting that identity is becoming a core interoperability and security requirement.

Zero-Trust Execution Boundary

The execution boundary should understand the requesting user, current task, approved tools, target resource, allowed workflow state, risk level and whether a human must confirm. Cryptographic checks can protect identity and integrity, but they do not replace semantic policy validation.

Validating Outbound Tool Arguments

Validation	Defensive question
Schema	Are required fields present and typed correctly?
Identity	Which user, agent and service requested this?
Authorization	May this identity perform this action?
Resource	Is the target record, file, account or system allowed?
Business rule	Is the amount, state, destination or workflow valid?
Risk	Does this action require human approval?

Use strict schemas and allowlists, but do not stop there. Recalculate prices and permissions on the server, validate ownership, limit destinations, apply rate controls, and reject requests that conflict with business state.

Prompt Injection in AI Agents

Agents often read websites, documents, emails, support tickets, repositories, tool descriptions and retrieved knowledge before acting. Any of these sources can contain untrusted instructions. Use prompt-injection controls, separate instructions from data, preserve source provenance and prevent external content from directly selecting privileged actions.

For current real-world context, see Indirect Prompt Injection in AI Agents.

Memory and Persistent Context Poisoning

Session memory, long-term memory, summaries and user profiles can preserve sensitive, stale or attacker-controlled content. Store only what is needed, label the source, separate users and tenants, expire old data, revalidate memory before sensitive use, and support review, deletion and quarantine.

Memory type	Primary control
Conversation context	Limit scope and treat external content as untrusted
Long-term user memory	Consent, isolation, retention and deletion
Shared team memory	Role-based access and provenance
Operational summaries	Revalidation before future actions

Agent Supply Chain and MCP Security

Agents depend on models, orchestration frameworks, prompts, tools, plugins, skills, MCP servers, packages and external APIs. Review ownership, versions, permissions, update channels and data flows. A trusted component can become unsafe after an update, so approval must include ongoing change detection and revocation.

Use the MCP Security Guide for protocol controls, MCP Tool Poisoning for tool-metadata risks, and the OWASP Agentic Applications Top 10 for a wider risk map.

Agent Security Control Framework

These six control pillars turn model suggestions into governed application behavior. The minimum controls remain visible so readers can assess the framework without opening a stack of accordions.

01
Agent identity
Give each agent a traceable identity separate from the human user and backend service.
Minimum controls
- Use short-lived credentials.
- Record delegated authority.
- Avoid shared administrator tokens.
02
Tool permissions
Expose only the minimum tools and actions needed for the task.
Minimum controls
- Prefer read-only operations.
- Separate write and delete actions.
- Apply per-user and per-resource authorization.
03
Input and argument validation
Treat model-selected tools and generated arguments as untrusted.
Minimum controls
- Use strict schemas.
- Recalculate sensitive values.
- Allowlist destinations and workflow states.
04
Memory and context
Control what can persist and influence future actions.
Minimum controls
- Track provenance.
- Separate users and tenants.
- Support review, deletion and quarantine.
05
Human approval
Show the exact action, target, data and impact before confirmation.
Minimum controls
- Avoid vague approvals.
- Expire old approvals.
- Require reapproval after material changes.
06
Monitoring and rollback
Make unsafe behavior visible and recoverable.
Minimum controls
- Log plans, calls and results.
- Add cost and rate controls.
- Test disable, revocation and rollback.

Current Agent-Security Standards and Evidence

Key standards, frameworks, research and incidents that shape the controls in this guide.

Dec 2025Framework
OWASP GenAI Security Project
OWASP Agentic Applications Top 10 released
OWASP separated goal, tool, identity, memory, supply-chain and rogue-agent risks into an agent-focused framework.
OWASP framework
Feb 2026Standard
NIST
NIST AI Agent Standards Initiative launched
NIST highlighted agent identity, authentication, interoperability, security evaluations and protocol standards.
NIST announcement
Mar 2026Research
Unit 42
Indirect prompt injection observed in the wild
Researchers documented malicious web content targeting AI review and agent workflows, making external content an active security boundary.
Unit 42 report
Apr 2026Incident
OWASP GenAI Security Project
Agent and MCP incidents mapped to control failures
OWASP connected real incidents to tool misuse, identity abuse, human trust, rogue behavior and supply-chain weaknesses.
OWASP incident round-up

Emergency Stop, Rollback and Recovery

Prevention is not enough. Agent systems need a tested response path for incorrect or compromised behavior.

Disable affected tools without shutting down unrelated services.
Revoke agent, user and service credentials.
Cancel queued or scheduled actions.
Prefer reversible changes and delayed destructive operations.
Quarantine suspicious memory and retrieved context.
Preserve prompts, plans, approvals, calls and results for investigation.
Test rollback and kill-switch behavior before production use.

AI Agent Security Checklist

Are users, agents and services represented by separate identities?
Can each tool be used only by authorized users and resources?
Are tool choices and arguments validated outside the model?
Can untrusted content influence privileged actions?
Are memory, prompts, outputs and logs protected from data leakage?
Do high-impact actions require clear human approval?
Are rate, time, cost and loop limits enforced?
Can tools be disabled, credentials revoked and changes rolled back?

Explore AI Security Topics

MCP SecuritySecure MCP hosts, clients, servers, authorization, tools, resources, and audits.MCP Tool PoisoningPrevent unsafe tool descriptions, schemas, updates, shadowing, and results.OWASP Agentic Top 10Understand the ten leading risks for autonomous and tool-using AI agents.Indirect Prompt Injection in AgentsRecent evidence, attack surfaces, layered defenses, and response guidance.Prompt InjectionUnderstand direct and indirect prompt injection across LLM apps and agents.LLM Security ChecklistReview LLM apps, RAG, agents, MCP, tools, monitoring, and approvals.

FAQs

AI agent security protects systems where AI can plan, call tools, use memory, access data, communicate, or take actions. It combines normal application security with agent identity, permissions, execution policy, approvals, monitoring, and recovery.

Agents combine untrusted content, model reasoning, memory and tool permissions. Without external controls, one unsafe instruction or incorrect decision can create operational impact.

Excessive agency means an AI system has more autonomy, permissions, tools, time, or data access than it needs for the task.

Validate the schema, requesting identity, authorization, target resource, business rules, risk level, rate limits, and whether human approval is required before execution.

It is a deterministic orchestration or tool-gateway layer that treats every model-generated plan, tool selection, and argument as untrusted until policy checks approve it.

Low-risk, reversible actions may be automated. Sensitive, irreversible, privileged, financial, or externally visible actions should usually require explicit approval.

Sources and further reading

OWASP Top 10 for Agentic Applications 2026 — Agent goal, tool, identity, memory, supply-chain and autonomy risk framework
OWASP Agentic AI Threats and Mitigations — Threat-model-based guidance for agentic AI systems
NIST AI Agent Standards Initiative — Agent identity, interoperability, authentication and security standards work
NIST AI Risk Management Framework — Risk management structure for AI systems
Unit 42 - Web-Based Indirect Prompt Injection in the Wild — March 2026 evidence of indirect prompt-injection activity

AI Agent Security

Table of Contents

Quick Answer

What Is AI Agent Security?

AI Agent Security Architecture

AI Agent Identity and Authorization

Zero-Trust Execution Boundary

Validating Outbound Tool Arguments

Prompt Injection in AI Agents

Memory and Persistent Context Poisoning

Agent Supply Chain and MCP Security

Agent Security Control Framework

Agent identity

Tool permissions

Input and argument validation

Memory and context

Human approval

Monitoring and rollback

Current Agent-Security Standards and Evidence

OWASP Agentic Applications Top 10 released

NIST AI Agent Standards Initiative launched

Indirect prompt injection observed in the wild

Agent and MCP incidents mapped to control failures

Emergency Stop, Rollback and Recovery

AI Agent Security Checklist

Explore AI Security Topics

FAQs

What is AI agent security?

Why are AI agents risky?

What is excessive agency?

How should teams validate AI agent tool calls?

What is a zero-trust execution boundary for AI agents?

Should AI agents be allowed to act without humans?

Sources and further reading

Related Articles