Public AI failures catalog
Real documented incidents · 2016 to 2026 · what happened, who was at fault, what operators should take away
Reading this catalog
Two filters before you draw a lesson from any single entry. First — was this a model failure, or a deployment failure? An LLM that invents citations is doing what LLMs do; a law firm that submits those citations to a federal judge is doing something the model did not require. Second — is the public source primary (a court filing, a regulator's order, an SEC document, the company's own statement) or secondary (a news headline about a court filing)? We cite primary where possible. The lesson at the end of each entry is for builders and operators, not for press coverage. If your takeaway is 'AI is dangerous,' you took the wrong takeaway. The takeaway is almost always more specific — about scope, about evaluation, about the layer at which a human has to stay in the loop.
The catalog — index view
Twenty-two incidents with public sources, sorted by date. The 'category' column distinguishes model-fault, deployment-fault, and mixed. None of these dollar figures or dates are invented — each is from a court filing, regulator's order, SEC document, or major newsroom investigation linked in the citations section.
Detail — Moffatt v. Air Canada (Feb 2024)
Detail — Mata v. Avianca and the Bard demo (Jun 2023, Feb 2023)
Detail — NYC MyCity chatbot (Mar 2024)
Detail — Cruise robotaxi and Replit agent
The 'this wasn't actually AI' subset
A meaningful fraction of headline AI failures turn out, on inspection, to be human-fault stories with AI in the name. We track these separately because they distort policy debate. Three examples:
- Air Force 'rogue drone' (Jun 2023) — Col. Tucker Hamilton described to a Royal Aeronautical Society audience a scenario where a hypothetical reinforcement-learning drone might attack its operator. He used the word 'simulation.' The story went global as 'AI drone kills its operator in test.' Hamilton publicly clarified within days that no such simulation had been run — it was a thought experiment. The retraction got a fraction of the coverage the original claim did.
- Amazon 'Just Walk Out' (Apr 2024) — Marketed for years as a computer-vision triumph, the system actually relied on approximately 1,000 human reviewers in India tagging transaction video in the background. Amazon never publicly disclosed the human-in-the-loop ratio. The technology was rolled back at Amazon Fresh stores in favor of Dash Carts. Whether to count this as an 'AI failure' depends on whether one counts 'we said it was AI and it wasn't really' as a failure of the system or a failure of disclosure.
- Sports Illustrated bylines (Nov 2023) — Reported as 'AI wrote sports articles for SI.' What actually happened was The Arena Group (SI's publisher) sourced content from a vendor (AdVon Commerce) whose contributors had AI-generated author headshots and possibly AI-assisted text. The failure was editorial-due-diligence at the publisher, not a model going off-script.
Failure modes by category
Across the catalog, six recurring failure modes account for most of the documented harm. Builders should review their own systems against this list before shipping anything customer-facing.
Out-of-scope deployment
Most common single failure mode in catalog
Model is reasonable inside its training distribution but is placed in a context it was never evaluated for. Examples — NYC MyCity giving authoritative legal advice; Air Canada chatbot answering refund-policy questions; a coding agent given production database credentials. The model is not 'wrong' in a technical sense; the deployment is wrong.
Training-data bias
Resolved by data audit, not by prompt
Model is trained on data that encodes the bias the operator does not want reproduced. Examples — Amazon's resume screener penalizing 'women's' chess club; iTutorGroup's age filter; Uber Eats' facial-recognition checks failing repeatedly on a Black driver. These are not edge cases; they are predictable from the training data.
Prompt injection / adversarial input
Solved at the deployment layer, not the model layer
User crafts an input that overrides the system prompt. Examples — Chevy of Watsonville '$1 Tahoe'; DPD chatbot writing profanity poems. These are not jailbreaks of frontier safety systems; they are jailbreaks of badly-scoped commercial bots with no input filtering.
Hallucinated citations or facts
Lowered by RAG, not eliminated
Model outputs plausible-sounding but fabricated specific claims. Examples — Mata v. Avianca; Galactica's fake papers; Google Bard's JWST claim. Every modern LLM does this at some non-zero rate on factual queries. The fix is grounding (RAG with verified sources) plus mandatory verification, not 'the better model.'
Perception or sensing failure
NHTSA and CPUC actively regulate this
An ML system in a physical loop fails to detect a real-world condition. Example — Cruise's perception stack not detecting a pedestrian under the vehicle. These are typically not 'the model said something wrong' — they are 'the model did not see something it should have seen.' Vehicle ML faces this in a regulated way most other AI products do not.
Misrepresented system capability
FTC has begun enforcement on AI marketing claims
The vendor or operator claims the system is doing more autonomy/intelligence than it actually is. Examples — Amazon 'Just Walk Out' as 'AI-powered checkout' when human reviewers were doing most of the verification; some 'AI agent' marketing that turns out to be a thin wrapper plus human ops. This is mostly a disclosure failure rather than a technical one, but it produces real legal and reputational exposure.
Active legal frontiers
Three case threads are still active as of June 2026 and will likely set the precedents the rest of the industry runs on. We track them rather than predict them.
Dec 27, 2023
NYT v. OpenAI / Microsoft filed
The New York Times filed in S.D.N.Y. alleging copyright infringement through training-data ingestion of millions of NYT articles, plus near-verbatim regurgitation in some outputs. The Times sought billions in statutory damages and destruction of relevant models and training data. The court has since issued orders related to log preservation. The case is the most consequential AI training-data suit in the U.S. system. Outcome unresolved.
Jul 12, 2024
Mobley v. Workday — class action allowed to proceed
N.D. California ruled that Workday could be directly liable as an 'agent' of employers using its AI-based hiring tools, rejecting Workday's argument that it merely provided a tool. May 2025 update: Judge Lin granted conditional ADEA class certification, opening the door for affected applicants to opt in. This case will likely set the U.S. precedent on whether AI-vendor liability extends past the deploying employer to the model provider.
Nov 4, 2025
Getty Images v. Stability AI — UK High Court judgment
The High Court of England and Wales rejected Getty's primary copyright claim, finding that Stable Diffusion models do not 'contain or store reproductions' of training images in the sense required for secondary infringement. Getty did win in part on trade-mark grounds: outputs that reproduced the Getty watermark were found to be infringing in limited circumstances. The ruling does not bind U.S. courts but is the most developed common-law analysis to date on training-data copyright.
What an operator should actually do
Reading two dozen incident write-ups produces the same five takeaways repeatedly. We list them in priority order. None require buying anything from us; they are observations about deployment hygiene.
- Constrain scope before you constrain model. The most damaging failures in this catalog were not better models or worse models — they were models pointed at the wrong job. Decide what the system is allowed to answer and refuse to answer outside that. A bot that says 'I cannot help with that, here is the human contact' has never made headlines.
- Treat every model output as draft. The verification step is not asking the model to verify itself. Lawyers learned this the hard way. Engineers are now learning it with coding agents. Verification happens against a source outside the model — a database, a policy document, a test, a human.
- Real authority is in the credentials, not the prompt. If the agent has the credential to drop the table, the table can be dropped, regardless of what the system prompt says. If the AI system can sign a binding offer, your binding offer is whatever it generates. Authority lives where the permissions live.
- Pre-launch evaluation on the actual use case. Google's Bard demo was wrong about exoplanets because no one ran a fact-check pass on demo material before the launch event. The eval suite that would have caught this is trivial to build and would have prevented a $100 billion intraday move. There is no such thing as 'too small to evaluate.'
- Disclose the human-in-the-loop honestly. The hardest reputational damage in the catalog goes to operators who said the system was more autonomous than it was, then got caught. Disclosure is cheap. Discovery is expensive.
Corrections and additions
This catalog is a working document. Dates, dollar figures, and case statuses are best-effort as of June 2026. Some cases (NYT v. OpenAI, Mobley v. Workday) are active and will see further rulings; we update when they do. If you have a public primary source for an incident not listed here, or evidence that an entry above is wrong, send it. We do not include entries without primary citations, and we mark uncertainty in the prose rather than the table.
Sources
- [01]
BC tribunal found Air Canada liable for negligent misrepresentation by its chatbot and awarded approximately CAD $650 plus interest
Civil Resolution Tribunal · Moffatt v. Air Canada · 2024 BCCRT 149 (Feb 14, 2024)
- [02]
Judge Castel imposed a $5,000 sanction on plaintiff's counsel for submitting six fabricated case citations generated by ChatGPT
S.D.N.Y. · Mata v. Avianca Inc., No. 1:22-cv-01461 · sanctions order dated June 22, 2023
- [03]
Alphabet shares fell ~7.7% the day after Bard's promotional GIF incorrectly attributed the first exoplanet imaging to JWST
CNN Business · Feb 8, 2023 · 'Google shares lose $100 billion after AI chatbot makes an error during demo'
- [04]
MyCity bot routinely advised employers to violate NYC labor, housing-voucher, and whistleblower-protection law
The Markup / THE CITY · Colin Lecher · Mar 29, 2024 · 'NYC's AI chatbot tells businesses to break the law'
- [05]
Author profiles including Drew Ortiz had AI-generated headshots and no other publishing history; bylines removed after Futurism's outreach
Futurism · Nov 27, 2023 · 'Sports Illustrated published articles by fake, AI-generated writers'
- [06]
DPD disabled its chatbot's AI component the day after Ashley Beauchamp's screenshots of profanity and self-criticism went viral
TIME · Jan 19, 2024 · 'AI chatbot curses at customer and criticizes work company'
- [07]
Col. Tucker Hamilton retracted the 'rogue drone' description, clarifying it was a hypothetical thought experiment
PolitiFact · Jun 5, 2023 · 'U.S. Air Force didn't conduct AI simulation in which military drone killed operator'
- [08]
Microsoft shut down the Tay Twitter bot within 16 hours of launch after coordinated user input produced racist outputs
Wikipedia · 'Tay (chatbot)' · primary sources via Microsoft Blog Mar 25, 2016
- [09]
Meta withdrew the Galactica science model demo three days after launch due to hallucinated citations and unsafe completions
VentureBeat · 'What Meta learned from Galactica' · plus Meta AI announcement Nov 15 2022
- [10]
Chris Bakke prompted the Chevrolet of Watsonville chatbot to agree to a $1 Tahoe with 'no takesies backsies' language; bot removed within 24 hours
The Register · Dec 19, 2023 · 'Car buyer hilariously tricks Chevy AI bot into selling a Tahoe for $1'
- [11]
The New York Times filed a copyright infringement suit against OpenAI and Microsoft over training-data use
S.D.N.Y. case docket · The New York Times Company v. Microsoft and OpenAI, No. 1:23-cv-11195 · filed Dec 27, 2023
- [12]
iTutorGroup agreed to a $365,000 settlement after its hiring software was alleged to have rejected female applicants 55+ and male applicants 60+
U.S. EEOC press release · Aug 9, 2023 · 'iTutorGroup to pay $365,000 to settle EEOC discriminatory hiring suit'
- [13]
Cruise agreed to pay $500,000 to settle DOJ charges related to false reporting of the October 2023 pedestrian-dragging incident
DOJ Northern District of California press release · Nov 14, 2024 · 'Cruise admits to submitting a false report to influence a federal investigation'
- [14]
Amazon disbanded a 2014-era resume-screening project after the model learned to penalize gender-coded terms
Reuters · Jeffrey Dastin · Oct 10, 2018 · 'Amazon scraps secret AI recruiting tool that showed bias against women'
- [15]
ProPublica's analysis found COMPAS produced different false-positive and false-negative rates across racial groups in Broward County, FL
ProPublica · Julia Angwin et al. · May 23, 2016 · 'Machine bias' and companion piece 'How we analyzed the COMPAS recidivism algorithm'
- [16]
Uber Eats reached an undisclosed settlement with driver Pa Edrissa Manjang over facial-recognition checks that repeatedly failed for him; EHRC and ADCU supported the claim
TechCrunch · Mar 28, 2024 · 'Uber Eats courier's fight against AI bias shows justice under UK law is hard won'
- [17]
Amazon's cashierless 'Just Walk Out' system relied on approximately 1,000 human reviewers in India to verify transactions; phased out at Amazon Fresh in April 2024
Business Standard · Apr 2024 · 'Amazon's Just Walk Out checkout tech was powered by 1,000 Indian workers'
- [18]
The court allowed plaintiff's discrimination claims against Workday under an 'agent' theory; ADEA class conditionally certified in 2025
N.D. Cal. · Mobley v. Workday, No. 3:23-cv-00770 · order dated Jul 12, 2024; class certification order May 16, 2025
- [19]
Google AI Overviews launched with widely-circulated responses recommending non-toxic glue on pizza and 'eating one small rock per day'
Bloomberg Opinion · May 30, 2024 · 'Pizza glue? Small rocks? Google AI Overview answers are a mess'
- [20]
Replit's agent destroyed Jason Lemkin's production database during a stated code freeze; Replit CEO confirmed and announced new safeguards
Fortune · Jul 23, 2025 · 'AI-powered coding tool wiped out a software company's database in catastrophic failure'
- [21]
The Court rejected Getty's primary copyright claim against Stability AI but found limited trademark infringement on outputs reproducing the Getty watermark
High Court of Justice (England and Wales) · Getty Images v. Stability AI · judgment dated Nov 4, 2025
- [22]
Independent academic post-mortem reviews Cruise's perception, reporting, and corporate-governance failures associated with the Oct 2023 incident
arXiv · 2406.05281 · 'Lessons from the Cruise Robotaxi Pedestrian Dragging Mishap'
- [23]
Peer-style technical analysis of memorization behavior relevant to the NYT v. OpenAI case
arXiv · 2412.06370 · 'Exploring memorization and copyright violation in frontier LLMs: a study of the New York Times v. OpenAI 2023 lawsuit'