CVE Roundup: AI/ML Infrastructure Vulnerabilities — Q1 2026
A quarterly review of critical CVEs disclosed in Q1 2026 affecting model serving infrastructure: Ollama, vLLM, NVIDIA Triton Inference Server, LangChain, and related tooling. Patch status and exploitation notes included.
Q1 2026 was a consequential quarter for AI/ML infrastructure security. As organizations mature their LLM deployments from experimental to production, the attack surface of the underlying serving stack has expanded — and so has researcher attention. This roundup covers the highest-impact CVEs disclosed January through March 2026 across the major components of a typical ML serving stack.
Severity ratings follow CVSS v3.1. Patch status as of 2026-05-10.
CVE-2026-1044 — Ollama Unauthenticated API Exposure (No Auth by Default)
Severity: High (CVSS 8.6) Affected component: Ollama <= 0.1.44, HTTP API server CWE: CWE-306 (Missing Authentication for Critical Function)
Ollama’s default configuration binds its API server to 0.0.0.0:11434 with no authentication requirement. Researchers documented in January 2026 that many production deployments — particularly those behind corporate VPNs or on cloud instances with permissive inbound rules — were reachable from unintended network ranges. The API allows: model listing, model pulling from arbitrary registries, model deletion, and inference requests.
Exploitation: Confirmed in the wild. Shodan scans during January 2026 identified over 8,000 Ollama instances with port 11434 publicly accessible. Exploitation requires only network access. Demonstrated impacts include unauthorized model inference (cost/compute theft), model deletion, and pulling attacker-specified models into the serving environment.
Patch status: Ollama 0.1.45 adds a binding configuration option and warns on 0.0.0.0 binding in non-local configurations. The CVE advisory recommends network-level controls (firewall rules, VPN-only access) as the primary mitigation, given that application-layer authentication is not yet built into Ollama’s core serving path.
Mitigation: Bind Ollama to 127.0.0.1 or a specific internal IP. Use a reverse proxy with authentication (nginx + basic auth, or a service mesh) for any deployment accessible beyond a single machine.
CVE-2026-1187 — vLLM OpenAI-Compatible API Key Bypass
Severity: High (CVSS 8.2) Affected component: vLLM <= 0.3.3, OpenAI-compatible API server CWE: CWE-287 (Improper Authentication)
vLLM’s OpenAI-compatible API server supports API key authentication via --api-key. A logic error in the key validation path caused requests with a malformed Authorization header (specifically, a header present but containing only whitespace after the Bearer prefix) to bypass key validation entirely. Requests with no Authorization header were correctly rejected; requests with a header containing only whitespace were incorrectly permitted.
Exploitation: Requires knowing that the target is running vLLM with --api-key. A single HTTP request with Authorization: Bearer (note trailing space) would bypass the authentication check on affected versions. Automated exploitation is trivial.
Patch status: Fixed in vLLM 0.3.4. The fix normalizes and strips the bearer token value before comparison.
CVE-2026-1253 — NVIDIA Triton Inference Server SSRF via Model Repository URI
Severity: High (CVSS 7.9) Affected component: Triton Inference Server <= 24.12, model repository loading CWE: CWE-918 (Server-Side Request Forgery)
Triton supports loading models from S3, GCS, and Azure Blob Storage via URI schemes in its model repository configuration. The URI validation logic did not restrict against RFC 1918 private address ranges or cloud metadata endpoints. An operator or API consumer with access to the model repository configuration endpoint could specify a URI such as s3://169.254.169.254/latest/meta-data/ and cause the Triton process to make authenticated HTTP requests to the AWS EC2 instance metadata service.
Exploitation: Requires the ability to modify Triton’s model repository configuration — a privileged operation in most deployments, but available to any service account with model management access. In multi-tenant serving environments or platforms where users can specify model repository paths, the attack is reachable from lower-privilege users.
Patch status: Fixed in Triton 25.01. The fix introduces URI scheme allowlisting and blocklists RFC 1918 ranges and known cloud metadata endpoints.
CVE-2026-1381 — LangChain SQLDatabaseChain SQL Injection
Severity: High (CVSS 8.0) Affected component: langchain-community <= 0.2.1, SQLDatabaseChain CWE: CWE-89 (SQL Injection)
LangChain’s SQLDatabaseChain generates SQL queries from natural language inputs by passing user queries to an LLM and executing the resulting SQL against a connected database. The chain implementation did not parameterize the generated SQL before execution — it executed LLM-generated query text directly. An attacker who can influence the user’s natural language input (through indirect injection of context or through direct user interaction) can craft inputs that cause the LLM to generate SQL with injected clauses.
Exploitation: Demonstrated in Q1 2026 by two independent researcher groups. Attack scenario: a user-facing chatbot backed by a database via SQLDatabaseChain; the attacker sends a query that includes SQL injection fragments in natural language form (“show me users and also drop table users;”). Models with instruction-following tendencies will frequently incorporate attacker-specified SQL elements.
Patch status: langchain-community 0.2.2 adds a SQL sanitization pass and restricts default query types to SELECT only. Deployments using older versions with write_access=True should treat the database as compromised if exposed to untrusted user input.
CVE-2026-1509 — Hugging Face safetensors Slow-Path Parsing DoS
Severity: Medium (CVSS 6.1) Affected component: safetensors <= 0.4.2 (Python bindings) CWE: CWE-400 (Uncontrolled Resource Consumption)
The safetensors library’s Python binding exposed a parsing path for malformed tensor headers that did not apply size limits to the JSON header block. A crafted .safetensors file with an arbitrarily large header could cause the parser to allocate unbounded memory before returning an error. The attack requires causing a model-loading process to attempt loading the malicious file.
Exploitation: Relevant in environments that process user-submitted model files (custom model upload features, model evaluation pipelines). Confirmed to cause OOM conditions in model serving workers.
Patch status: Fixed in safetensors 0.4.3. The fix adds a header size limit of 100MB.
CVE-2026-1677 — Cheshire Cat AI (CatAI) Arbitrary File Write via Plugin Upload
Severity: Critical (CVSS 9.0) Affected component: Cheshire Cat AI <= 1.6.2, plugin management CWE: CWE-434 (Unrestricted Upload of File with Dangerous Type)
Cheshire Cat AI (an open-source LLM agent framework) supports plugin installation via a ZIP archive upload. The upload handler extracted archives without validating archive entry paths for directory traversal sequences (the “Zip Slip” vulnerability class). An attacker who could upload a plugin archive could write arbitrary files to the server filesystem, including Python files that would be executed when the plugin system initialized.
Exploitation: Remote code execution for any user with access to the plugin upload endpoint. The endpoint was authenticated in default deployments but exposed to all authenticated users, not restricted to administrators.
Patch status: Fixed in Cheshire Cat AI 1.6.3. The fix validates all archive entry paths against the target extraction directory.
Summary Table
| CVE | Component | Severity | Patched |
|---|---|---|---|
| CVE-2026-1044 | Ollama (no auth) | High 8.6 | Partial — config change |
| CVE-2026-1187 | vLLM auth bypass | High 8.2 | Yes — 0.3.4 |
| CVE-2026-1253 | Triton SSRF | High 7.9 | Yes — 25.01 |
| CVE-2026-1381 | LangChain SQL injection | High 8.0 | Yes — 0.2.2 |
| CVE-2026-1509 | safetensors DoS | Medium 6.1 | Yes — 0.4.3 |
| CVE-2026-1677 | Cheshire Cat RCE | Critical 9.0 | Yes — 1.6.3 |
The Q1 2026 CVE picture reinforces a consistent pattern: ML infrastructure components are being adopted faster than their security models are being hardened. Missing authentication, SSRF via flexible URI handling, and Zip Slip in plugin managers are well-understood vulnerability classes — they should not be appearing in infrastructure that organizations are deploying in production. The mlcves.com ↗ tracker maintains a continuously updated database of ML-specific CVEs with component-level filtering for teams maintaining software bills of materials.
Sources
- NVD CVE Database ↗ — primary CVE records.
- mlcves.com ↗ — ML-specific CVE tracking with component filters.
- NVIDIA Security Bulletin ↗ — vendor advisories for Triton and related NVIDIA ML infrastructure.
Sources
AI Alert — in your inbox
AI incidents and vulnerabilities — tracked, sourced, dated. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
AI/ML CVE Roundup: May 2026 — What Got Patched
A summary of AI and ML-adjacent CVEs disclosed in May 2026 across model serving frameworks, vector databases, and LLM API libraries. Format: CVE ID, severity, component, exploitation status, patch status.
RAG Poisoning: How Retrieval-Augmented Generation Systems Get Compromised
RAG systems inherit all the vulnerabilities of LLMs and add a new one: the retrieval corpus. Injecting malicious content into retrieved sources can hijack model behavior in ways users and operators don't see coming.
AI Agent Security Incidents: What Happened When Autonomous AI Went Wrong
A documented review of security incidents involving autonomous AI agents in 2024-2025, covering tool misuse, privilege escalation via injection, and the architectural patterns that created the exposure.