AI Alert
disclosure

LLM Security Risks: The 2025 Threat Landscape for AI Deployments

By AI Alert Desk ·

The attack surface of enterprise AI deployments is formally catalogued and growing. LLM security risks now span application logic, training pipelines, inference infrastructure, and — with the proliferation of agentic systems — external tool integrations that give models autonomous reach into internal networks and APIs. Two industry-maintained taxonomies, the OWASP Top 10 for LLM Applications 2025 and MITRE ATLAS, provide the most comprehensive maps of this territory. Neither is theoretical: both are built from documented real-world incidents.

OWASP LLM Top 10 (2025): Application-Layer Risk Register

The OWASP project, maintained by over 600 contributing security experts, catalogues the ten most critical risks facing LLM applications. The 2025 edition refined earlier categories to reflect production deployment patterns and added categories specific to retrieval-augmented generation (RAG) and agentic workflows.

LLM01: Prompt Injection. Direct and indirect manipulation of model inputs to override intended behavior. Indirect injection — where malicious instructions are embedded in documents, web pages, or tool outputs the model later processes — is increasingly the more operationally relevant variant in production deployments. For a technical breakdown of injection variants and jailbreak techniques, see aisec.blog.

LLM02: Insecure Output Handling. Model outputs passed without sanitization to downstream systems, enabling cross-site scripting, SQL injection, or remote code execution in connected components. The LLM itself is not the vulnerable layer; the integration is.

LLM03: Sensitive Information Disclosure. Training memorization, in-context leakage, and insufficient prompt isolation allow models to expose personally identifiable information, proprietary data, or credentials in responses. System prompts containing internal tooling details or API keys represent a recurring exposure pattern.

LLM04: Training Data and Model Poisoning. Compromise of datasets or fine-tuning pipelines to introduce backdoors or behavioral bias. Attackers with access to fine-tuning workflows can alter model behavior at inference time without modifying deployed weights, making detection substantially harder than conventional binary tampering.

LLM05: Supply Chain Vulnerabilities. Pre-trained model weights sourced from public repositories, third-party datasets, and integrated ML dependencies carry persistent supply-chain risk. Malicious model checkpoints using pickle deserialization have demonstrated real-world exploitation of this vector, with Hugging Face repositories repeatedly identified as a delivery mechanism.

LLM06: System Prompt Leakage. Crafted user inputs can extract hidden system instructions, operational logic, or injected credentials from the model context — information that materially reduces the cost of targeted attacks against the application.

LLM07: Vector and Embedding Weaknesses. RAG pipelines introduce a specific attack surface: poisoned vectors in vector databases can steer model responses toward attacker-controlled content. Embedding inversion attacks can reconstruct sensitive source documents from stored embeddings, exposing data that was never intended to be surfaced in responses.

LLM08: Misinformation. Hallucinated or adversarially influenced outputs that produce factually false information. Impact scales with deployment context: misinformation in code generation, legal summarization, or clinical decision support carries greater consequence than in a consumer chatbot.

LLM09: Unbounded Consumption. Uncontrolled inference resource usage enables denial-of-service against model-serving infrastructure and creates a side channel for iterative model extraction through high-volume query campaigns targeting decision boundaries.

LLM10: Excessive Agency. Agentic deployments with over-permissioned tool access allow prompt injection to cascade into real-world actions: file writes, API calls, database queries, and inter-agent message injection. The blast radius of a single successful injection expands proportionally to the tools granted to the model. Defensive guardrail tooling that addresses LLM03, LLM06, and LLM10 at the inference layer is documented at guardml.io.

MITRE ATLAS: Adversary Tactics Mapped to AI Systems

MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) catalogues 14 distinct tactics and 84 techniques adversaries apply against AI systems across their full lifecycle, from reconnaissance to impact. The November 2025 v5.1.0 release added a Command and Control tactic (AML.TA0015) and 14 new techniques specifically addressing agentic AI attack vectors — memory manipulation, thread injection, RAG credential harvesting, and tool invocation attacks. The framework now documents 42 confirmed real-world case studies.

Key techniques with production-environment relevance:

Prompt Injection. Adversarial instructions introduced into model inputs. ATLAS maps this across both initial access and persistence tactic categories when used in agentic contexts, reflecting that a single injected instruction in a processed document can establish persistent behavioral modification for the session.

Model Extraction. Iterative querying to reverse-engineer proprietary model weights, hyperparameters, and decision boundaries. High-cost inference APIs are primary targets; confidence scores and logprobs in API responses accelerate reconstruction. Rate limiting and response perturbation are the primary defenses.

Training Data Poisoning. Corruption of datasets upstream of training. ML supply-chain compromises — where malicious data enters through public dataset providers or contaminated upstream dependencies — rank among the highest-likelihood initial access vectors in 2025 ATLAS case studies.

Adversarial Examples. Crafted inputs that cause misclassification in vision models or semantic manipulation in text models without triggering human-visible anomalies. Primarily relevant to content moderation and safety filter bypass in multimodal deployments.

ML Supply Chain Compromise. Insertion of malicious code or weights into model repositories, data pipelines, or fine-tuning infrastructure — the AI-specific analog of software supply chain attacks, with the added complication that malicious behavior embedded in weights is substantially harder to detect than malicious code in source files.

Defender Priorities

Security and platform teams responsible for LLM deployments should address the following:

  1. Treat all external model inputs as untrusted. Apply validation and sanitization to every content source processed by an LLM, including documents, API responses, and tool outputs returned to an agentic model. Indirect prompt injection through processed content is the primary exploitation path for agentic systems.

  2. Audit and restrict tool permissions on agents. Apply least-privilege to every tool integration. Agentic systems capable of executing shell commands, writing to databases, or sending email should have those capabilities explicitly justified, scoped, and logged. Excessive Agency (LLM10) requires over-provisioning to be exploitable; reducing permissions directly reduces blast radius.

  3. Validate supply-chain provenance before loading weights. Scan pre-trained model checkpoints for pickle-based exploits. Treat Hugging Face and public model repositories as untrusted until artifacts are hash-verified and scanned. Pin model artifact hashes in deployment pipelines and treat weight files with the same rigor as third-party binaries.

  4. Instrument inference for anomaly detection. Log prompt/response pairs with sufficient fidelity for forensic reconstruction. Monitor per-user query volumes for patterns consistent with model extraction campaigns: high-frequency requests with systematically varying inputs designed to probe decision boundaries.

  5. Conduct structured red-team exercises against your LLM stack. The OWASP and ATLAS frameworks provide testable, scenario-specific hypotheses for each risk category. At minimum: prompt injection testing across all external input channels, system prompt extraction attempts, and a review of the tool permission surface for every deployed agent.

Sources

Sources

  1. OWASP Top 10 for Large Language Model Applications
  2. OWASP LLM Top 10 Vulnerabilities 2025
  3. MITRE ATLAS Framework: Guide to Securing AI Systems
Read the full article →