
AI buzzword decoder: what each term actually means in 2026
A senior-person field guide. Marketing meaning, technical reality, and when the term is being used dishonestly.
The three most-abused words: AGI, ASI, superintelligence
These three terms have no fixed technical definition and are used to mean very different things by different speakers. They are the most common source of pitch-deck inflation in the field.
AGI (artificial general intelligence)
No agreed definition
Marketing meaning: a system that can do any cognitive task a human can. Technical reality: there is no operational test for AGI that the field agrees on. OpenAI's stated mission references AGI, and their 2023 'Planning for AGI' essay defines it loosely as 'AI systems that are generally smarter than humans.' DeepMind's Morris et al. (2023) proposed a level-based framework (emerging, competent, expert, virtuoso, superhuman) precisely because there was no consensus. Dishonest use: claiming a system is 'approaching AGI' or 'AGI-level' to imply human-equivalent breadth when the system has only been benchmarked on narrow tasks.
ASI (artificial superintelligence)
Speculative term
Marketing meaning: AI smarter than humans across all domains, often used interchangeably with superintelligence. Technical reality: a Bostrom-era philosophy term (Superintelligence, 2014) for a hypothetical system surpassing the best human minds in every cognitive task. No system in 2026 meets this definition by any rigorous measure. Dishonest use: timelines for ASI within a stated number of years, presented as forecasts when they are actually opinions of company leadership without a defined yardstick.
Superintelligence
Often a synonym for ASI
Marketing meaning: same as ASI in most current usage. Technical reality: Bostrom distinguished speed superintelligence (think faster), collective superintelligence (more minds in parallel), and quality superintelligence (better cognition per unit). Most 2026 usage collapses all three into one word. Dishonest use: when a company launches a 'superintelligence team' and the deliverable is a chatbot benchmark improvement. The word is doing PR work, not technical work.
The glossary, in one scrollable table
Quick scan for the most common terms. Marketing meaning is what you will hear in a pitch. Technical reality is what the system is actually doing. Dishonest tell is the specific pattern where the term is being misused.
Why the AGI debate is mostly definitional
The alignment vocabulary, decoded
Alignment is the most-used and least-specified word in the safety conversation. Here is the actual technique stack underneath the word, with the original references where they exist.
- RLHF (Reinforcement Learning from Human Feedback) — Christiano et al. 2017, Ouyang et al. 2022 (InstructGPT). Train a reward model on human preference pairs, then PPO the base model against that reward. The dominant alignment technique through 2023.
- RLAIF (Reinforcement Learning from AI Feedback) — Lee et al. 2023 (Google). Use a stronger model to generate preference labels instead of humans. Cheaper, debated quality.
- Constitutional AI — Bai et al. 2022 (Anthropic). Replace some or all human feedback with a written set of principles; the model critiques and revises its own outputs against the constitution.
- DPO (Direct Preference Optimization) — Rafailov et al. 2023. Mathematically equivalent reformulation of RLHF that skips the explicit reward model. Now widely adopted.
- Scalable oversight — research direction (debate, recursive reward modeling, weak-to-strong generalization). The bet is that humans can supervise models smarter than themselves with the right protocol. See Burns et al. 2023 (OpenAI), Irving et al. 2018.
- Interpretability — separate research direction. Sparse autoencoders (Anthropic, 2024), mechanistic interpretability (Olah et al.), circuits research. Goal is to understand what a model is doing internally, not just shape its outputs.
- Red teaming — adversarial testing. Now standard practice at major labs and increasingly required by policy (US Executive Order 14110 of 2023, EU AI Act high-risk obligations).
Hallucination: what it is, what it isn't, why 'eliminated' is a lie
Hallucination is not a bug to be patched. It is a property of how decoder-only language models work: they generate the next token that maximizes a learned distribution, with no built-in ground-truth check. Even a model that scores 100 percent on a benchmark can confabulate when asked something outside the benchmark distribution. The useful framing from Ji et al. 2023 distinguishes intrinsic hallucinations (contradicting the input or context) from extrinsic hallucinations (contradicting external world facts that aren't in the input). RAG addresses some intrinsic cases. Tool use and citation requirements address some extrinsic cases. Neither eliminates them. The literature is consistent that the floor is non-zero for open-domain generation. When a vendor pitches 'hallucination-free' AI, the honest interpretation is: bounded domain, narrow query distribution, post-hoc verification, or all three. Ask which. If the answer is fuzzy, the claim is marketing. Also worth knowing: ablation studies from 2023-2025 show RAG can introduce new hallucinations by including misleading retrieved passages, so 'we use RAG' is not by itself a safety story.
Tool use, function calling, MCP: a short timeline of how they differ
These three terms get conflated regularly. They are sequential layers of standardization, not synonyms.
2022 — Toolformer
Tool use as a research concept
Schick et al. (Meta AI) showed a language model could be trained to call APIs (calculator, search, translation) and use the results. Foundational paper for the modern category.
2023 — Function calling
OpenAI standardizes the schema
OpenAI launched 'function calling' in GPT-3.5/4: the developer defines a JSON schema, the model emits structured calls matching it, the developer's runtime executes. Became the industry pattern; Anthropic, Google, Mistral followed with their own implementations.
2024 — MCP launched
Anthropic ships an open protocol
Model Context Protocol (modelcontextprotocol.io) is an open spec for how AI assistants connect to data sources and tools via standardized servers. Reduces the M×N integration problem (M assistants, N data sources) to M+N.
2025-2026 — Ambient agents emerge
Event-driven agent loops
Tool use plus background triggers (email, webhook, calendar event, schedule). LangChain, LangGraph, and others published patterns; multiple vendors now ship 'ambient' or 'background' agent products. The category is real but the marketing is ahead of the reliability data.
Three rules of thumb when reading any AI claim
Internal rubric for parsing marketing material from press releases, vendor blogs, and pitch decks. Sentence-case, no jargon, works for anyone non-technical.
Name the benchmark or shut up
Rule 1
If a claim does not name a specific public benchmark with a specific score, treat it as marketing until proven otherwise. Real benchmarks: MMLU, HumanEval, GPQA, SWE-bench, ARC-AGI, BIG-Bench. Vague 'state-of-the-art' or 'best-in-class' phrases are not benchmark results. Held-out evaluation matters more than headline numbers (see benchmark contamination literature).
Ask 'compared to what'
Rule 2
Any percentage improvement is meaningless without a named baseline and confidence interval. '90 percent accurate' tells you nothing if you don't know what the previous accuracy was, what humans score, what random would score, and how the test set was constructed. The Schaeffer et al. 2023 critique of emergent abilities is a clean example of how baseline choice changes the story entirely.
Demand failure modes
Rule 3
Any honest AI vendor will name the failure modes of their system, the rate at which they occur, and how the runtime handles them. If the answer is 'we don't fail' or 'edge cases are rare,' that is the dishonest tell. Anthropic, OpenAI, and Google all publish model cards documenting known failures. The presence or absence of a model card is itself a signal.
Terms we left out, and why
Sources
- [01]
Vaswani et al. 2017 'Attention Is All You Need' introduced the transformer architecture that underlies almost all modern LLMs.
arxiv.org/abs/1706.03762
- [02]
Brown et al. 2020 'Language Models are Few-Shot Learners' (GPT-3 paper) established the scaling-via-prompting paradigm.
arxiv.org/abs/2005.14165
- [03]
Kaplan et al. 2020 established empirical scaling laws for neural language models.
arxiv.org/abs/2001.08361
- [04]
Hoffmann et al. 2022 (Chinchilla) showed that prior LLMs were undertrained relative to compute-optimal scaling, revising Kaplan's coefficients.
arxiv.org/abs/2203.15556
- [05]
Wei et al. 2022 'Emergent Abilities of Large Language Models' is the foundational paper on emergence claims.
arxiv.org/abs/2206.07682
- [06]
Schaeffer et al. 2023 'Are Emergent Abilities of Large Language Models a Mirage?' (NeurIPS 2023 best paper) showed many emergence claims were metric artifacts.
arxiv.org/abs/2304.15004
- [07]
Wei et al. 2022 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models' introduced the canonical CoT prompting technique.
arxiv.org/abs/2201.11903
- [08]
Yao et al. 2022 'ReAct: Synergizing Reasoning and Acting in Language Models' established the standard reasoning-and-action agent pattern.
arxiv.org/abs/2210.03629
- [09]
Lewis et al. 2020 introduced retrieval-augmented generation (RAG) for knowledge-intensive NLP tasks.
arxiv.org/abs/2005.11401
- [10]
Hu et al. 2021 'LoRA: Low-Rank Adaptation of Large Language Models' is the dominant parameter-efficient fine-tuning method.
arxiv.org/abs/2106.09685
- [11]
Ouyang et al. 2022 (InstructGPT) is the canonical reference for RLHF as deployed in production language models.
arxiv.org/abs/2203.02155
- [12]
Christiano et al. 2017 introduced the foundational RLHF method using human preference comparisons.
arxiv.org/abs/1706.03741
- [13]
Bai et al. 2022 'Constitutional AI: Harmlessness from AI Feedback' (Anthropic) introduced the CAI training approach.
arxiv.org/abs/2212.08073
- [14]
Rafailov et al. 2023 'Direct Preference Optimization' provides a reward-model-free reformulation of RLHF that is now widely adopted.
arxiv.org/abs/2305.18290
- [15]
Schick et al. 2023 'Toolformer' was the early formulation of language models learning to call external APIs.
arxiv.org/abs/2302.04761
- [16]
Ji et al. 2023 'Survey of Hallucination in Natural Language Generation' is the canonical hallucination taxonomy reference.
arxiv.org/abs/2202.03629
- [17]
Turpin et al. 2023 showed that chain-of-thought explanations can be post-hoc rationalizations not reflecting the model's actual reasoning.
arxiv.org/abs/2305.04388
- [18]
Sainz et al. 2023 documented benchmark contamination in LLM evaluations, motivating held-out testing.
arxiv.org/abs/2310.16787
- [19]
Morris et al. 2023 (DeepMind) 'Levels of AGI' proposed a generality-by-performance framework to operationalize AGI discussions.
arxiv.org/abs/2311.02462
- [20]
Bommasani et al. 2021 (Stanford CRFM) introduced the term 'foundation model' for large pretrained models adaptable to many tasks.
arxiv.org/abs/2108.07258
- [21]
Model Context Protocol is Anthropic's open specification for connecting AI assistants to tools and data sources via standardized servers, launched November 2024.
modelcontextprotocol.io
- [22]
OpenAI's 2023 'Planning for AGI and beyond' essay defines AGI loosely as AI systems generally smarter than humans, without operational test criteria.
openai.com/index/planning-for-agi-and-beyond
- [23]
Anthropic's mechanistic interpretability publications including sparse autoencoder work on Claude 3 Sonnet are the primary public reference for circuit-level interpretability research.
transformer-circuits.pub
- [24]
US Executive Order 14110 made red-teaming and safety reporting requirements for frontier AI developers a regulatory expectation in the US.
whitehouse.gov · Executive Order 14110 (October 2023)
- [25]
Bostrom's 2014 book formalized the distinction between speed, collective, and quality superintelligence used in subsequent ASI discussions.
Nick Bostrom · Superintelligence (Oxford University Press, 2014)