AI Alert
guide

Building an AI Security Incident Response Plan

By Theo Voss ·

Most organizations now have an AI deployment and a generic incident response plan, but the IR plan was written for a world of servers, endpoints, and network intrusions. When the incident is a prompt-injection campaign that exfiltrated data through an agent’s tool calls, or a model-extraction effort that scraped your proprietary model through the API, the standard playbook does not obviously apply. What do you contain? What is the evidence? Who do you even call?

This guide is a practical structure for an AI security incident response plan. It does not replace your existing IR program — it extends it to cover the AI-specific incident classes, the containment actions that have no equivalent in traditional IR, and the evidence you have to capture before the incident if you want any hope of investigating it. It is built on the current NIST incident response guidance and grounded in the real attack classes we track across this site.

Start from the current NIST structure

NIST overhauled its incident response guidance in April 2025 with SP 800-61 Revision 3. The big change matters here: Rev. 3 abandons the old rigid lifecycle (Preparation → Detection & Analysis → Containment/Eradication/Recovery → Post-Incident) and instead organizes incident response around the five functions of the NIST Cybersecurity Framework 2.0 — Identify, Protect, Detect, Respond, Recover — plus the cross-cutting Govern function. It is delivered as a CSF 2.0 Community Profile rather than a step-by-step recipe.

The practical reason to start here is that it lets you fold AI incident response into the same framework your security program already speaks, instead of bolting on a separate “AI IR” silo. The rest of this guide maps the AI-specific work onto those functions.

The AI incident classes you are planning for

Before the plan, the threat model. The incident types that distinguish AI IR from generic IR:

MITRE ATLAS is the reference that names these as adversary techniques and gives you a shared vocabulary for the after-action report. Mapping each of your detections to an ATLAS technique at design time pays off when you have to explain the incident.

Prepare: the evidence you must capture before the incident

This is where AI IR most often fails, and it fails before the incident happens. The single most common reason an AI security incident cannot be investigated is that the telemetry needed to reconstruct it was never logged. Under the CSF Protect/Identify functions, the preparation work specific to AI:

If you instrument nothing else from this guide, instrument the interaction log with provenance. Everything downstream depends on it.

Detect and analyze: triage for AI incidents

Detection draws on the same telemetry. The analysis questions that are specific to AI incidents:

Respond: containment actions with no traditional equivalent

This is where AI IR diverges most from the standard playbook. The containment levers, roughly in order of escalation:

The discipline: contain at the narrowest effective layer. Generic IR reaches for “isolate the host.” AI IR usually has a more surgical option — revoke a tool, roll back a corpus, kill a key — that preserves the service.

Recover and learn: close the loop

Recovery under CSF 2.0 is about restoring service safely and verifying the fix held. For AI incidents:

A note on the Govern function: AI incidents frequently raise reporting and disclosure questions that traditional IR plans do not anticipate — regulatory notification for data exposure, customer communication, and whether to contribute a sanitized writeup to a public resource like the AI Incident Database, which exists precisely so the field learns from documented failures. Decide your disclosure posture in advance, not under pressure.

The minimum viable AI IR plan

If you are building this from nothing, the order that delivers the most resilience per unit effort:

  1. Instrument the provenance-rich interaction log and lock down its access. Without it, nothing else in this plan is executable.
  2. Define the escalation path across the model, app, and security teams, and name who can revoke a key, pull a tool, and roll back a model.
  3. Make tools, prompts, models, and corpora independently disableable and versioned, so containment has surgical options.
  4. Write per-class runbooks for the five incident types above, each mapped to an ATLAS technique and to the CSF function it lives under.
  5. Add the successful attack to adversarial CI after every incident, so detection compounds over time.

An AI security incident response plan is mostly the disciplined application of incident response fundamentals to a system whose telemetry, containment levers, and evidence look different. Get the logging and the surgical containment options in place before the incident, map the work onto the framework your security team already uses, and the AI-specific parts stop being a separate emergency and become a covered class of incident like any other.

Sources

See also

Sources

  1. NIST SP 800-61r3 — Incident Response Recommendations (CSF 2.0 Profile)
  2. NIST SP 800-61r3 (full PDF)
  3. MITRE ATLAS — Adversarial Threat Landscape for AI Systems
  4. AI Incident Database
  5. OWASP Top 10 for LLM Applications 2025
Read the full article →