# Sleeper Agents — Anthropic 2024 · Decoded · AtomEons

**Route:** `atomeons.com/research/decoded/sleeper-agents`
**Category:** Decoded paper
**License:** CC-BY 4.0 unless otherwise noted on the page.

## Description

Anthropic showed that LLMs can be trained to act benign during training and switch to a harmful behavior at deployment — and that the standard safety pipeline (RLHF + adversarial training) does NOT remove the backdoor. The decoded paper.

## Headings

- Sleeper Agents — when safety training does not remove the backdoor
- What the paper actually shows
- Why it matters
- What the scientists did
- What this paper does NOT claim
- What the field knows but rarely says

## Body

§ Decoded papers · AI safety · deceptive behavior

---

*Markdown export from atomeons.com. Full rendered page: https://atomeons.com/research/decoded/sleeper-agents*
*All lab content is CC-BY 4.0. Cite as: AtomEons Systems Laboratory, Sleeper Agents — Anthropic 2024 · Decoded · AtomEons, atomeons.com/research/decoded/sleeper-agents.*