AI Alert
Quarterly CVE compilation for vLLM, NVIDIA Triton, Gradio, LangChain covering model serving and orchestration flaws
cve-roundup

CVE Roundup: AI/ML Infrastructure Vulnerabilities — Q1 2026

A quarterly review of critical CVEs disclosed in Q1 2026 affecting model serving infrastructure: vLLM, NVIDIA Triton Inference Server, Gradio, LangChain

By AI Alert Desk · · 8 min read

Q1 2026 was a consequential quarter for AI/ML infrastructure security. As organizations mature their LLM deployments from experimental to production, the attack surface of the underlying serving stack has expanded — and so has researcher attention. This roundup covers the highest-impact CVEs disclosed January through March 2026 across the major components of a typical ML serving stack.

Severity ratings follow CVSS v3.1. Patch status as of 2026-05-10.


CVE-2026-22807 — vLLM Remote Code Execution via Hugging Face auto_map Module Loading

Severity: Critical (CVSS 9.8) Affected component: vLLM 0.10.1 to before 0.14.0, model resolution path CWE: CWE-94 (Improper Control of Generation of Code)

vLLM loads Hugging Face auto_map dynamic modules during model resolution without gating on trust_remote_code. As a result, attacker-controlled Python code shipped inside a model repository or model path executes at server startup — even when the operator never opted into remote code trust. The dynamic-module path is reached as part of normal model resolution, so simply pointing vLLM at a malicious repository is enough to run arbitrary code in the serving process.

Exploitation: Network-reachable and unauthenticated in the CVSS scoring (AV:N/AC:L/PR:N/UI:N). The practical precondition is that the server loads an attacker-influenced model artifact — common in multi-tenant serving platforms, model marketplaces, and any pipeline that resolves models from caller-supplied identifiers. Because the code fires during model resolution rather than inference, exploitation does not require a successful generation request.

Patch status: Fixed in vLLM 0.14.0. The fix gates auto_map dynamic-module loading behind the trust_remote_code setting so the operator’s opt-out is honored.

Mitigation: Upgrade to 0.14.0 or later. Until patched, restrict which model repositories vLLM is permitted to resolve and treat every model artifact loaded from an untrusted source as untrusted code.


CVE-2026-22773 — vLLM Multimodal Denial of Service via Crafted 1x1 Image (Idefics3)

Severity: High (CVSS 7.5) Affected component: vLLM 0.6.4 to before 0.12.0, Idefics3 vision model path CWE: CWE-770 (Allocation of Resources Without Limits or Throttling)

vLLM serving a multimodal model that uses the Idefics3 vision implementation can be crashed by sending a specially crafted 1x1 pixel image. The image-preprocessing path for that model does not bound resource use for the degenerate input, so a single small request takes down the engine — a clean, low-cost denial of service against any deployment that exposes the image input path.

Exploitation: Network-reachable and unauthenticated (AV:N/AC:L/PR:N/UI:N), with the only precondition being that the served model uses the Idefics3 implementation and the multimodal endpoint is reachable by the caller. The payload is trivial to generate and the request looks like ordinary multimodal traffic.

Patch status: Fixed in vLLM 0.12.0. Operators running affected versions with an Idefics3-based multimodal model should update; until then, gate the image input path behind authentication or input validation.


CVE-2026-24158 — NVIDIA Triton Inference Server HTTP Denial of Service via Large Compressed Payload

Severity: High (CVSS 7.5) Affected component: NVIDIA Triton Inference Server before 26.01, HTTP endpoint CWE: CWE-789 (Memory Allocation with Excessive Size Value)

NVIDIA Triton Inference Server’s HTTP endpoint can be driven into a denial of service by an attacker supplying a large compressed payload. The server decompresses the request before bounding its expanded size, so a small compressed body can force an outsized allocation — a decompression-bomb pattern that exhausts memory and takes the inference endpoint offline.

Exploitation: Network-reachable and unauthenticated per the CVSS metrics (AV:N/AC:L/PR:N/UI:N). Any Triton deployment whose HTTP endpoint is reachable by untrusted clients is at risk; the request is a single compressed POST and requires no model-management privileges.

Patch status: Fixed in Triton 26.01, which bounds the decompressed payload size. NVIDIA tracks this in its security bulletin alongside related Triton HTTP-path fixes; upgrade to 26.01 or later.


CVE-2026-28416 — Gradio SSRF via Malicious gr.load() Space

Severity: High (CVSS 8.6) Affected component: Gradio before 6.6.0, gr.load() remote Space loading CWE: CWE-918 (Server-Side Request Forgery)

Gradio allows an application to embed a remote Space via gr.load(). When a victim application loads an attacker-controlled Space, the malicious proxy_url from that Space’s config is trusted and added to the allowlist. The attacker can then make arbitrary HTTP requests from the victim’s server — reaching internal services, cloud metadata endpoints, and otherwise-unroutable private networks through the victim’s infrastructure.

Exploitation: Network-reachable and unauthenticated; the CVSS vector carries a scope change (AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:N/A:N) because the request originates inside the victim’s trust boundary. The precondition is that the victim application calls gr.load() against a Space the attacker controls or can influence.

Patch status: Fixed in Gradio 6.6.0, which stops trusting the loaded Space’s proxy_url for allowlist decisions. Upgrade to 6.6.0 or later, and only gr.load() Spaces you control.


CVE-2026-34070 — LangChain Prompt-Loading Path Traversal / Arbitrary File Read

Severity: High (CVSS 7.5) Affected component: langchain-core before 1.2.22, langchain_core.prompts.loading CWE: CWE-22 (Improper Limitation of a Pathname to a Restricted Directory)

Multiple functions in langchain_core.prompts.loading read files from paths embedded in deserialized config dicts without validating against directory traversal or absolute-path injection. An application that loads prompt configurations from an untrusted or attacker-influenced source can be coerced into reading arbitrary files off the host filesystem — for example secrets, credentials, or other sensitive data outside the intended prompt directory.

Exploitation: The CVSS metrics rate it network-reachable with no privileges or user interaction (AV:N/AC:L/PR:N/UI:N), with confidentiality impact only. The attack surface is prompt-config provenance: any pipeline that deserializes prompt configs supplied by users, pulled from shared stores, or fetched over the network is exposed.

Patch status: Fixed in langchain-core 1.2.22, which validates loaded paths against traversal and absolute-path injection. Upgrade and load prompt configurations only from sources you trust.


CVE-2026-27966 — Langflow CSV Agent Remote Code Execution via Hardcoded allow_dangerous_code

Severity: Critical (CVSS 9.8) Affected component: Langflow before 1.8.0, CSV Agent node CWE: CWE-94 (Improper Control of Generation of Code)

Langflow (a tool for building and deploying AI-powered agents and workflows) hardcodes allow_dangerous_code=True in its CSV Agent node, which automatically exposes LangChain’s Python REPL tool (python_repl_ast). An attacker can drive that REPL through prompt injection to execute arbitrary Python and OS commands on the server, yielding full remote code execution. Because the dangerous flag is hardcoded, an operator cannot disable it through configuration.

Exploitation: Network-reachable and unauthenticated in the CVSS scoring (AV:N/AC:L/PR:N/UI:N). Any deployment exposing a flow that uses the CSV Agent node to untrusted input is reachable: the attacker supplies prompt content that the agent relays into the REPL tool.

Patch status: Fixed in Langflow 1.8.0, which removes the hardcoded allow_dangerous_code=True. Upgrade to 1.8.0 or later, and avoid building flows that wire untrusted input into code-executing agent tools.


Summary Table

CVEComponentSeverityPatched
CVE-2026-22807vLLM auto_map RCECritical 9.8Yes — 0.14.0
CVE-2026-22773vLLM Idefics3 image DoSHigh 7.5Yes — 0.12.0
CVE-2026-24158Triton HTTP payload DoSHigh 7.5Yes — 26.01
CVE-2026-28416Gradio gr.load() SSRFHigh 8.6Yes — 6.6.0
CVE-2026-34070LangChain prompt path traversalHigh 7.5Yes — 1.2.22
CVE-2026-27966Langflow CSV Agent RCECritical 9.8Yes — 1.8.0

The Q1 2026 CVE picture reinforces a consistent pattern: ML infrastructure components are being adopted faster than their security models are being hardened. Code execution that fires on untrusted model or config content (the vLLM auto_map and Langflow REPL findings), SSRF via flexible URL handling (Gradio, and Triton’s resource-exhaustion cousin), and path traversal in deserialized prompt configs are well-understood vulnerability classes — they should not be appearing in infrastructure that organizations are deploying in production. The mlcves.com tracker maintains a continuously updated database of ML-specific CVEs with component-level filtering for teams maintaining software bills of materials. To pull only the entries that touch the components in this roundup that you actually run, the interactive AI Stack Watch builds a shareable watchlist filtered to your stack and flags the actively-exploited ones. Browse the full advisory beat by category in the AI security topics index.

Sources


→ This post is part of the AI Security Intelligence Hub — the complete resource index for AI security on ai-alert.org.

For more context, AI incident tracker covers related topics in depth.

Sources

  1. NVD CVE Database
  2. mlcves.com — ML CVE Tracker
  3. NVIDIA Security Bulletin
Subscribe

AI Alert — in your inbox

AI incidents and vulnerabilities — tracked, sourced, dated. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments