
The new attack surface cyber teams are missing.
Every Fortune 500 deployed an LLM into a customer-facing product between 2023 and 2026. Almost none of them did so with a threat model that accounts for LLM-specific risks. This is the rarest cyber skill set in the market right now and it pays a premium because of it.
Two public frameworks define the field: OWASP's LLM Top 10 (the canonical vuln catalog) and MITRE ATLAS (the adversarial-ML attack framework, analog to MITRE ATT&CK for traditional cyber). Learn both and you have the vocabulary the field uses.
What can go wrong in an LLM application.
Maintained by the OWASP foundation. Updated 2023 and 2024. Read the full project at owasp.org.
LLM01 · Prompt Injection
Direct or indirect manipulation of model instructions via user input. Indirect injection (model reads tainted external content — webpage, PDF, email, RAG corpus) is the harder variant. Most-cited LLM-specific vulnerability in 2026 deployments.
LLM02 · Insecure Output Handling
Treating LLM output as trusted before sanitization. Classic example: LLM generates SQL or shell that a downstream system executes. Always treat LLM output as untrusted user input · validate / escape / sandbox.
LLM03 · Training Data Poisoning
Adversary contaminates fine-tuning or RAG-corpus data with content designed to manipulate model behavior. Detection requires data-provenance discipline most teams don't have.
LLM04 · Model Denial of Service
Crafted prompts that cause runaway token consumption, exhaust context windows, trigger expensive tool-use loops. Cost-bomb attacks on token-billed apps.
LLM05 · Supply Chain Vulnerabilities
Compromised model weights from Hugging Face, poisoned third-party datasets, malicious dependencies in agent frameworks.
LLM06 · Sensitive Information Disclosure
Model leaks training data, system prompts, prior-conversation context, or API keys it was given access to. Membership-inference attacks. Training-data extraction.
LLM07 · Insecure Plugin Design
Agent / tool-use plugins that take input without validation, execute privileged actions without authorization, or expose secrets through tool definitions.
LLM08 · Excessive Agency
Giving an agent more permissions, tool access, or autonomy than the use case requires. Classic example: file-system write access for a 'summarize my emails' agent.
LLM09 · Overreliance
Humans treating LLM output as authoritative when it's wrong. Includes downstream automation that doesn't verify before acting. The Air Canada chatbot ruling is the canonical case.
LLM10 · Model Theft
Extraction of proprietary model weights or capabilities via output querying, distillation attacks, or supply-chain compromise of training infrastructure.
The attack lifecycle, mapped.
MITRE ATLAS (Adversarial Threat Landscape for AI Systems) is the AI-specific analog of MITRE ATT&CK. Same 10-tactic structure (Recon → Impact) with ML-specific techniques. Maintained by MITRE at atlas.mitre.org. Used by defenders to think about kill-chain disruption + by red teams to structure adversarial-ML engagement plans.
Reconnaissance
Adversary gathers information on the target AI system. Includes searching for publicly-available model details, querying the model to characterize behavior, identifying the system prompt.
Resource Development
Adversary builds capability — crafts adversarial inputs, develops poisoning datasets, acquires victim-model access for distillation.
Initial Access
Adversary obtains the foothold — through prompt injection in tainted web content the AI reads, through compromised plugins/tools, through legitimate user access misused.
ML Model Access
Adversary gains access to query the model in ways that enable downstream attacks (extraction, evasion).
Execution
Adversary causes the model to execute attacker-supplied logic — via prompt injection, plugin abuse, or agentic tool misuse.
Persistence
Adversary maintains foothold. Includes poisoning persistent memory / vector stores so subsequent sessions exhibit the manipulated behavior.
Defense Evasion
Adversary bypasses safety guardrails. Jailbreaks, multi-turn social engineering, encoded payloads.
Discovery
Adversary learns about the system from inside — what tools are available, what data the model can see, what other systems it integrates with.
Exfiltration
Adversary extracts data through the model — training data, system prompts, RAG-indexed sensitive content, API keys exposed to tools.
Impact
Adversary achieves the goal — financial harm, reputation damage, denial of service, manipulated decisions in downstream systems.
How to skill into AI security specifically
- 01Master traditional appsec first. AI security is appsec + ML. Without solid web/app/API security fundamentals, AI-security depth is brittle. OSCP-level offensive + OWASP-Top-10-fluent defensive is the floor.
- 02Read both frameworks end-to-end. OWASP LLM Top 10 and MITRE ATLAS. The vocabulary alone separates AI-security practitioners from generalists.
- 03Do hands-on adversarial ML. The free practice resources: Lakera Gandalf (prompt-injection game, 7 levels of escalating difficulty), the HackAPrompt competition archives, the PortSwigger Web Security Academy LLM-attacks labs, Anthropic's and OpenAI's published red-team write-ups.
- 04Build a benchmark or red-team a real model. Pick an open-weights model. Build a curated prompt-injection benchmark. Test publicly available frontier APIs against it (within their published terms of service · do not violate API rules). Publish your results. Public work in this area gets read fast and lands interviews.
- 05Join the AI-security community. Follow practitioners on Twitter/X (Simon Willison's blog and feed is the canonical English-language source · NIST AI Safety Institute reports · UK AI Safety Institute publications · Anthropic Trust + Safety + Alignment team writing · Apollo Research evals). The community is small enough that quality public work gets noticed within weeks.
- 06Apply for AI-security-specific roles. Anthropic, OpenAI, Google DeepMind, Meta FAIR, xAI, NVIDIA, every major lab now has AI Trust + Safety / Red Team / Alignment Security teams. Apollo Research, METR (Model Evaluation and Threat Research), UK AISI, US NIST AI Safety Institute. Public AI-security consultancies (Robust Intelligence, HiddenLayer, Lakera, Mindgard).
The bug bounty programs paying for AI security
Every major AI lab now runs a bug bounty program with explicit AI-security scope. Most are on HackerOne or Bugcrowd. Bounty ranges are higher than general appsec because the field is undersupplied. As of 2026 best-effort:
- · Anthropic · bug bounty + prior research programs on AI safety. Published payouts ranged $1K-$25K+ in 2024-2025.
- · OpenAI · public bug bounty on Bugcrowd. AI Cybersecurity Grant Program for research.
- · Google · expanded Vulnerability Reward Program to cover generative AI specifically in 2023+. Up to $30K+ for highest-impact AI-specific findings.
- · Microsoft · added AI Bounty Program covering Copilot and Azure AI in 2023+. Up to $15K+.
- · Hugging Face · model-hub security bounty.
- · DEF CON AI Village · annual public red-teaming events with cash prizes. The 2023 generative red team at DEF CON 31 was the largest public AI red team in history.
Always verify program scope and payout amounts on the provider's official program page before testing. Amounts change.