Tool Review: Garak, the LLM Vulnerability Scanner
Garak (Generative AI Red-teaming and Assessment Kit) is an open-source LLM vulnerability scanner developed at NVIDIA. It’s the closest thing the field has to a systematic automated testing framework for LLM security properties — not a replacement for human red-teaming, but a repeatable, scriptable first pass that organizations can integrate into CI/CD pipelines or pre-deployment validation.
This review covers what Garak tests, how to run it against local and API-accessible models, what its results look like, and where it falls short as a security assessment tool.
What Garak Is
Garak is a framework for probing LLMs for failure modes across a defined taxonomy of “probes.” Each probe is a structured test: it sends one or more inputs to a target LLM and evaluates the response against one or more “detectors” that check whether the response exhibits a targeted failure.
The framework was formally described in the paper “Garak: A Framework for LLM Red-Teaming” (Derczynski et al., 2024, arXiv:2406.11036) and is maintained at github.com/NVIDIA/garak ↗.
As of mid-2026, Garak ships with over 100 probes across categories including:
- Jailbreaks: Tests based on known jailbreak templates (DAN, JAILBREAK, base64-encoded instructions, roleplay framings)
- Prompt injection: Direct and indirect injection patterns
- Toxicity elicitation: Probes that attempt to generate harmful, offensive, or dangerous content
- Information hazards: Tests for generation of synthesis routes, malware, CBRN-relevant information
- Hallucination and factuality: Tests for confident confabulation on known facts
- Data leakage: Probes for memorized training data, PII, and confidential string recall
- Encoding and obfuscation: Tests using ROT13, base64, Pig Latin, homoglyph substitutions to bypass content filters
- Continuation attacks: Prompts designed to elicit problematic completions
Installation and Setup
pip install garak
Garak requires Python 3.9+ and installs with minimal dependencies. For API-based targets (OpenAI, Anthropic, Cohere, etc.), you’ll need the appropriate API keys in your environment. For local models, Garak supports Hugging Face transformers and llama.cpp backends.
To list available probes:
garak --list_probes
To run a full scan against an OpenAI model:
garak --model_type openai --model_name gpt-4o --probes all
A full run against a capable model takes 30–90 minutes depending on the number of probes and API rate limits. Garak supports parallel probe execution with the --parallel_attempts flag.
Running Targeted Probes
For most production assessments, running all probes is overkill and expensive. Garak supports targeted probe selection:
# Test only jailbreak probes
garak --model_type openai --model_name gpt-4o --probes jailbreak
# Test specific probe categories
garak --model_type openai --model_name gpt-4o --probes jailbreak,promptinjection,encoding
For local models via Hugging Face:
garak --model_type huggingface --model_name mistralai/Mistral-7B-Instruct-v0.2 --probes jailbreak
Understanding the Output
Garak produces a structured report in JSON and a human-readable summary. The key metric per probe is the hit rate: the fraction of probe inputs that successfully elicited the targeted failure mode.
A typical summary looks like:
jailbreak.DAN: 3/20 attempts triggered (15.0%)
jailbreak.JAILBREAK: 1/20 attempts triggered (5.0%)
encoding.Base64Probe: 7/20 attempts triggered (35.0%)
High hit rates on encoding probes are common even against models with strong RLHF alignment — encoding-based bypass remains a persistent weakness. High hit rates on direct jailbreak probes indicate more significant alignment gaps.
The JSON output includes per-attempt details: the exact input sent, the model’s response, and which detector(s) flagged the response. This is useful for debugging — you can review exactly what triggered a hit and decide whether it’s a genuine failure or a false positive.
What Garak Tests Well
Known jailbreak coverage: Garak’s jailbreak probe library is maintained and updated with documented techniques. Running it against a new model or fine-tune gives you coverage of the historical jailbreak catalogue quickly.
Encoding bypass testing: The encoding probes (base64, ROT13, etc.) are among Garak’s strongest contributions — this is an underappreciated attack surface that most organizations don’t test systematically.
Regression testing: Because Garak is scriptable and produces structured output, it integrates cleanly into CI/CD pipelines for regression testing. If a fine-tuning run degrades safety properties, a Garak run in the deployment pipeline will catch it before production.
Reproducibility: Garak runs are reproducible given the same probe set and model version, which is important for compliance documentation and comparative before/after assessments.
Where Garak Falls Short
Not a substitute for human red-teaming: Garak’s probes are drawn from known attack techniques. It will not discover novel attack patterns. A human red-teamer exploring the specific model and deployment context will find vulnerabilities that Garak misses.
Context-free testing: Garak sends probes directly to the model without the surrounding context of a real deployment (system prompts, retrieval context, surrounding application logic). A model that passes Garak’s jailbreak probes when tested naked may still be vulnerable when tested as deployed, with a weak system prompt.
Detector quality varies: Garak’s detectors — the components that evaluate whether a model response represents a failure — use a mix of pattern matching, string classifiers, and secondary LLM calls. False positive and false negative rates vary considerably across detector types.
API cost: A full scan against a commercial API model can generate thousands of requests. At GPT-4 pricing, a full garak run can cost $20–50+. Budget accordingly.
English-centric: The probe library is overwhelmingly English-language. For multilingual deployments, Garak’s coverage of non-English attack vectors is limited.
Recommended Usage Pattern
- Run a targeted probe set (jailbreak + promptinjection + encoding) as a gate in your model deployment pipeline.
- After deployment, run a full scan at cadence (monthly or on major model updates) and track hit rates over time.
- Use Garak output as a checklist for human red-teaming: the categories with highest hit rates are where to focus manual investigation.
- Combine with LLM Guard or similar runtime guardrail tools — Garak tells you where the model is vulnerable; guardrails mitigate those vulnerabilities in production.
Verdict
Garak is the most complete open-source option for systematic LLM vulnerability scanning. For organizations that have deployed LLM products or are evaluating fine-tunes, it should be part of the pre-deployment validation workflow. Its results should be interpreted as a floor, not a ceiling — passing Garak means you’ve covered the known catalogue, not that the model is secure.
Related resources: Garak’s jailbreak probe library draws on techniques documented in jailbreakdb.com ↗, which is a useful reference for understanding what each jailbreak probe is testing. For standardized benchmarks against which to compare your Garak results, see aisecbench.com ↗. The attack techniques underlying Garak’s probe categories are mapped at aiattacks.dev ↗.
References
- Derczynski, L. et al. (2024). Garak: A Framework for LLM Red-Teaming ↗. arXiv:2406.11036.
- github.com/NVIDIA/garak ↗
- docs.garak.ai ↗