AI Alert
tools

Tool Review: LLM Guard for Input/Output Filtering

By Theo Voss ·

LLM Guard is an open-source library published by Protect AI that adds input and output scanning to LLM application pipelines. It sits in front of (and behind) your LLM calls, running prompts and completions through a configurable set of detectors before they reach users or downstream systems.

This review is for practitioners evaluating whether LLM Guard belongs in their defense stack, not for researchers looking for benchmark comparisons. We focus on deployment realities, what the tool actually catches, and where it does not help.

What LLM Guard Is

LLM Guard is a Python library. You import it, configure a set of input scanners and output scanners, and wrap your LLM calls. Prompts are passed through input scanners before they go to the model. Completions are passed through output scanners before they go back to the caller.

The library is model-agnostic: it works with OpenAI, Anthropic, Azure OpenAI, and any other inference API you call from Python. It does not modify the model or interact with the model host.

The project lives at github.com/protectai/llm-guard and maintains its own documentation at llm-guard.com. As of early 2026, the library is actively maintained with regular scanner additions.

What It Detects

LLM Guard’s detection capabilities are organized into input scanners (running on the user’s prompt before the model sees it) and output scanners (running on the model’s response before the caller receives it).

Input scanners include:

Output scanners include:

Deployment Patterns

Library integration: The most common pattern. Add LLM Guard to your existing Python LLM client code with a few wrapper calls. Input scanners run synchronously before each LLM call; output scanners run on each completion before returning. The overhead depends on which scanners are enabled.

REST API / sidecar: LLM Guard provides a FastAPI-based server mode that exposes scanning over HTTP. This allows non-Python services to use LLM Guard, and supports deploying it as a sidecar alongside your model serving layer.

Latency considerations: This is the primary operational constraint. Scanners that use small fine-tuned classifier models (the Prompt Injection scanner, Toxicity scanner) add latency per call. In our experience the prompt injection scanner adds 80-200ms depending on the hardware it runs on. Running all scanners in a production environment on CPU adds up. GPU acceleration or running LLM Guard on separate hardware helps. Being selective about which scanners you enable is the practical solution — not all scanners are needed in every deployment.

Real Limitations

Prompt injection detection is not a solved problem. LLM Guard’s prompt injection scanner catches known patterns and many variants of common bypasses. It will not catch novel adversarial suffixes optimized against the scanner, indirect injections embedded in retrieved documents (the scanner sees only the user’s direct prompt by default), or sufficiently creative role-play framings. It should be treated as one layer in a defense-in-depth stack, not a complete solution.

PII detection has false positives. The PII scanner will flag legitimate technical content containing numbers, addresses in code examples, and similar. Tuning thresholds is necessary in most deployments. Out-of-the-box sensitivity generates noise in engineering-focused applications.

No protection against indirect injection by default. LLM Guard scans the user’s prompt and the model’s output. It does not scan retrieved documents, tool outputs, or other external content that enters the model’s context. For RAG applications and agents, you need to pipe retrieved content through input scanners yourself — this is not done automatically.

Scanner models can be bypassed. The underlying classifier models used for toxicity and prompt injection detection were trained on known datasets. Adversarially crafted inputs that evade those classifiers exist and are not difficult to construct for a motivated attacker.

No context across turns. LLM Guard scans each input and output independently. Multi-turn attacks that build up to a harmful output across several benign-looking turns are not caught by per-message scanning.

When to Use It, When Not to

Good fit:

Not a fit, or needs supplementation:

What It Complements

LLM Guard fits alongside tools that address what it does not cover. aisecreviews.com maintains independent reviews of the broader AI security tooling landscape, including model scanning tools and infrastructure-level defenses. bestaisecuritytools.com tracks the current state of the market for teams comparing options before procurement. For a broader catalog of guardrail libraries beyond LLM Guard — including frameworks for agent-level privilege control and output validation pipelines — see guardml.io. Defense patterns and architectural guidance for complementing input/output filters with structural controls are covered at aidefense.dev.

For teams integrating LLM Guard into a broader security program: pair it with network-level logging of LLM API calls, separate monitoring of agent action traces, and periodic red-team exercises that target the scanner layer directly.

Sources


→ This post is part of the AI Security Intelligence Hub — the complete resource index for AI security on ai-alert.org.

For more context, AI incident tracker covers related topics in depth.

Sources

  1. LLM Guard — GitHub (protectai/llm-guard)
  2. LLM Guard Documentation
Read the full article →