Dark woven fabric with three bio-cyan threads in a geometric pattern — jargon is just woven language.

AI buzzword decoder: what each term actually means in 2026

A senior-person field guide. Marketing meaning, technical reality, and when the term is being used dishonestly.

Most AI vocabulary in 2026 is contested. Vendors use the same word to mean three different things in three different paragraphs of the same landing page. Researchers use it to mean a fourth thing. Press picks the most exciting one. By the time the term reaches a procurement committee or a board deck, it has lost almost all definitional weight, and the only honest answer to "what does this mean?" is "depends who is selling it." This page is a working glossary. For each term we give the plain-language marketing meaning (what you will hear in a pitch), the technical reality (what the paper or system actually does), and the dishonest pattern (when the word is being used to inflate, dodge, or distract). Where a real benchmark, paper, or official spec exists, we cite it. Where the field genuinely disagrees, we say so. Where we don't know something with certainty as of mid-2026, we mark it best-effort. The voice we are aiming for is the voice of a tired senior engineer at the back of the room. Nothing here is anti-AI. It is anti-bullshit. The technology is real, the progress is real, the wins are real, and so are the rugs, the redefinitions, the goalpost shifts, and the press-release physics that make everything sound bigger than it is. If you only read one section, read the one on AGI, ASI, and superintelligence. Those three words do more work to confuse otherwise rational discussions than the rest of the glossary combined. After that, the table is the workhorse: scan for the term you need, get the technical reality in one line, and move on. Everything else here is for the moments when somebody on a call says a word with a lot of confidence and you need to know whether to push back.

The three most-abused words: AGI, ASI, superintelligence

These three terms have no fixed technical definition and are used to mean very different things by different speakers. They are the most common source of pitch-deck inflation in the field.

AGI (artificial general intelligence)

No agreed definition

Marketing meaning: a system that can do any cognitive task a human can. Technical reality: there is no operational test for AGI that the field agrees on. OpenAI's stated mission references AGI, and their 2023 'Planning for AGI' essay defines it loosely as 'AI systems that are generally smarter than humans.' DeepMind's Morris et al. (2023) proposed a level-based framework (emerging, competent, expert, virtuoso, superhuman) precisely because there was no consensus. Dishonest use: claiming a system is 'approaching AGI' or 'AGI-level' to imply human-equivalent breadth when the system has only been benchmarked on narrow tasks.

ASI (artificial superintelligence)

Speculative term

Marketing meaning: AI smarter than humans across all domains, often used interchangeably with superintelligence. Technical reality: a Bostrom-era philosophy term (Superintelligence, 2014) for a hypothetical system surpassing the best human minds in every cognitive task. No system in 2026 meets this definition by any rigorous measure. Dishonest use: timelines for ASI within a stated number of years, presented as forecasts when they are actually opinions of company leadership without a defined yardstick.

Superintelligence

Often a synonym for ASI

Marketing meaning: same as ASI in most current usage. Technical reality: Bostrom distinguished speed superintelligence (think faster), collective superintelligence (more minds in parallel), and quality superintelligence (better cognition per unit). Most 2026 usage collapses all three into one word. Dishonest use: when a company launches a 'superintelligence team' and the deliverable is a chatbot benchmark improvement. The word is doing PR work, not technical work.

The glossary, in one scrollable table

Quick scan for the most common terms. Marketing meaning is what you will hear in a pitch. Technical reality is what the system is actually doing. Dishonest tell is the specific pattern where the term is being misused.

Term	Marketing meaning	Technical reality	Dishonest tell
Alignment	Making AI 'safe' and 'good'	Specific techniques: RLHF, constitutional AI, RLAIF, debate, scalable oversight. Each has known limits. See Bai et al. 2022, Christiano et al. 2017.	Used as a vibe word with no mention of which technique, which failure modes, or which benchmark.
Agent	An AI that 'does things for you'	A loop that takes a goal, picks tools or actions, observes outcomes, and iterates. ReAct (Yao et al. 2022) is the canonical pattern.	Single prompt-to-completion calls relabeled as 'agents' to ride the category.
Agentic	Adjective form of agent	Describes systems with multi-step planning and tool use. Anthropic's Claude SDK and OpenAI's Assistants API document specific agent loops.	Anything that 'has agency' even when it's a static workflow with no decision points.
Autonomous	Runs without a human	Operates in a defined loop with bounded actions and rollback. Real autonomy in production usually has guardrails, kill switches, and human escalation.	'Fully autonomous' when there is a human in the loop on every meaningful action.
Foundation model	A 'big base model'	Term from Stanford CRFM 2021 (Bommasani et al.) for large models trained on broad data, adaptable to many tasks via fine-tuning or prompting.	Used to mean any LLM, blurring the distinction between a 7B local model and a frontier-scale system.
Frontier model	The newest, biggest models	UK AI Safety Institute and Anthropic/OpenAI/Google policy docs use this for the most capable general-purpose models at the research frontier. No fixed parameter count.	Marketing slides calling a small fine-tuned model 'frontier' because it scores well on one benchmark.
Multi-modal	Handles text, images, audio, video	A model with input or output across multiple data modalities. GPT-4V, Gemini, Claude 3 family all multi-modal in different ways.	Counting text + a single image upload as 'fully multi-modal' when the system can't reason across video or audio.
Multi-agent	Many AIs working together	Architectures where multiple model instances or specialized agents coordinate. Microsoft AutoGen, LangGraph, CrewAI are public frameworks.	Calling a single model with multiple prompts 'a multi-agent system' to inflate complexity.
Emergent	Capabilities that appear at scale	Original Wei et al. 2022 'Emergent Abilities of LLMs' paper. Later contested by Schaeffer et al. 2023 NeurIPS best paper showing many emergent claims were metric artifacts.	Citing emergence as a magic property without acknowledging the Schaeffer rebuttal.
Scaling laws	Bigger = better, predictably	Specific power-law relationships from Kaplan et al. 2020 and Hoffmann et al. 2022 (Chinchilla). Loss decreases predictably with compute, data, parameters.	Claiming scaling 'guarantees' AGI. Scaling laws are about loss curves, not capability ceilings.
RAG	Lets AI use your data	Retrieval-augmented generation. Lewis et al. 2020. Fetch relevant docs, stuff into context, generate. Real systems add reranking, chunking, hybrid search.	'AI trained on your data' when the system is just doing vanilla RAG at query time, no training involved.
Fine-tune	Custom model for your use case	Continued training on a smaller, task-specific dataset. LoRA (Hu et al. 2021) is the dominant parameter-efficient approach.	Calling a system prompt 'a fine-tune' to imply deeper customization than exists.
Vector database	Database for AI	Storage and ANN search over embeddings. Pinecone, Weaviate, pgvector, Qdrant, Milvus, Chroma. HNSW and IVF are common index types.	Positioning a vector DB as 'the AI layer' when it is one component of a retrieval pipeline.
Semantic search	Search by meaning, not keywords	Search over dense vector embeddings using cosine or dot-product similarity. Usually paired with BM25 in hybrid setups for real-world quality.	Pure-vector demos that fall apart on exact-match queries (SKUs, names, IDs) which BM25 would handle.
Embedding	AI's representation of text	A fixed-length dense vector from a learned encoder (e.g., OpenAI text-embedding-3, Cohere, BGE, E5). Captures semantic similarity in vector space.	Treating embeddings as interpretable features. They aren't, beyond similarity.
Transformer	The neural net behind modern AI	Architecture from Vaswani et al. 2017 'Attention Is All You Need.' Self-attention + feedforward layers. Decoder-only is now dominant for LLMs.	Saying 'transformer-based' as a quality signal. Almost everything is. It's table stakes.
GPT	OpenAI's models	Generative Pretrained Transformer. A specific OpenAI lineage (GPT-2, 3, 3.5, 4, etc.) but also used generically for decoder-only LMs.	Calling a Llama or Mistral fine-tune 'a GPT.' Wrong family, wrong vendor.
Hallucination	AI making things up	Confident but incorrect or unsupported output. Survey: Ji et al. 2023 'Survey of Hallucination in NLG.' Cannot be eliminated, only reduced.	Vendors claiming 'hallucination-free' systems. No such thing in 2026 for open-domain generation.
Ground truth	The correct answer	Reference labels in a dataset used for evaluation. Quality depends entirely on the labeling process.	Treating LLM-generated labels as ground truth without human spot-checks.
Reasoning	The AI thinks	Multi-step output that decomposes a problem. OpenAI o1/o3 and similar models trained with RL to produce long chains. No agreed mechanistic definition of 'reasoning.'	Any chain-of-thought output relabeled as 'reasoning' regardless of whether the steps are valid.
Chain-of-thought	Step-by-step thinking	Prompting technique from Wei et al. 2022. Asking the model to show intermediate steps before final answer. Improves performance on math and logic.	Claiming CoT 'proves' the model reasons. The steps may be post-hoc rationalizations (Turpin et al. 2023).
Multi-step	More than one action	A workflow with sequential stages, often with tool calls between them. Operationally distinct from single-shot completion.	Used interchangeably with 'agentic' and 'reasoning' to suggest sophistication.
Deep research	AI does the research for you	Specific product category: long-horizon search + read + synthesize loops. OpenAI Deep Research, Google Gemini Deep Research, Perplexity Pro. Real benchmarks scarce.	Vendors using the phrase without specifying source coverage, citation accuracy, or how hallucinations are handled.
Tool use	AI calls APIs	Model emits structured calls to external functions; runtime executes and returns results. Toolformer (Schick et al. 2023) was an early formulation.	Demo videos that hide failure rates. Real tool-use systems have non-trivial error and retry needs.
Function calling	Specific form of tool use	Structured output (typically JSON) matching a defined schema, executed by the runtime. OpenAI introduced as 'function calling' in 2023; now industry standard.	Treating function calling as a security boundary. It is not — the runtime decides what to execute.
MCP	Model Context Protocol	Anthropic's open spec (modelcontextprotocol.io, 2024) for connecting AI assistants to data sources and tools via standardized servers.	Calling any AI integration 'MCP-based' when it's a custom API. MCP is a specific protocol.
Ambient AI	AI that's just there	Term used for always-on, context-aware assistants. No standard spec; usage varies by vendor.	Marketing word for 'we put the assistant in more places.' Usually means more telemetry, not more capability.
Ambient agents	Agents that run in the background	LangChain and others have published patterns: agents triggered by events (email, calendar) rather than user prompts. Documented in LangChain blog and OSS examples.	Promising 'background work' that turns out to be scheduled cron jobs with an LLM call.

TermAlignment

Marketing meaningMaking AI 'safe' and 'good'

Technical realitySpecific techniques: RLHF, constitutional AI, RLAIF, debate, scalable oversight. Each has known limits. See Bai et al. 2022, Christiano et al. 2017.

Dishonest tellUsed as a vibe word with no mention of which technique, which failure modes, or which benchmark.

TermAgent

Marketing meaningAn AI that 'does things for you'

Technical realityA loop that takes a goal, picks tools or actions, observes outcomes, and iterates. ReAct (Yao et al. 2022) is the canonical pattern.

Dishonest tellSingle prompt-to-completion calls relabeled as 'agents' to ride the category.

TermAgentic

Marketing meaningAdjective form of agent

Technical realityDescribes systems with multi-step planning and tool use. Anthropic's Claude SDK and OpenAI's Assistants API document specific agent loops.

Dishonest tellAnything that 'has agency' even when it's a static workflow with no decision points.

TermAutonomous

Marketing meaningRuns without a human

Technical realityOperates in a defined loop with bounded actions and rollback. Real autonomy in production usually has guardrails, kill switches, and human escalation.

Dishonest tell'Fully autonomous' when there is a human in the loop on every meaningful action.

TermFoundation model

Marketing meaningA 'big base model'

Technical realityTerm from Stanford CRFM 2021 (Bommasani et al.) for large models trained on broad data, adaptable to many tasks via fine-tuning or prompting.

Dishonest tellUsed to mean any LLM, blurring the distinction between a 7B local model and a frontier-scale system.

TermFrontier model

Marketing meaningThe newest, biggest models

Technical realityUK AI Safety Institute and Anthropic/OpenAI/Google policy docs use this for the most capable general-purpose models at the research frontier. No fixed parameter count.

Dishonest tellMarketing slides calling a small fine-tuned model 'frontier' because it scores well on one benchmark.

TermMulti-modal

Marketing meaningHandles text, images, audio, video

Technical realityA model with input or output across multiple data modalities. GPT-4V, Gemini, Claude 3 family all multi-modal in different ways.

Dishonest tellCounting text + a single image upload as 'fully multi-modal' when the system can't reason across video or audio.

TermMulti-agent

Marketing meaningMany AIs working together

Technical realityArchitectures where multiple model instances or specialized agents coordinate. Microsoft AutoGen, LangGraph, CrewAI are public frameworks.

Dishonest tellCalling a single model with multiple prompts 'a multi-agent system' to inflate complexity.

TermEmergent

Marketing meaningCapabilities that appear at scale

Technical realityOriginal Wei et al. 2022 'Emergent Abilities of LLMs' paper. Later contested by Schaeffer et al. 2023 NeurIPS best paper showing many emergent claims were metric artifacts.

Dishonest tellCiting emergence as a magic property without acknowledging the Schaeffer rebuttal.

TermScaling laws

Marketing meaningBigger = better, predictably

Technical realitySpecific power-law relationships from Kaplan et al. 2020 and Hoffmann et al. 2022 (Chinchilla). Loss decreases predictably with compute, data, parameters.

Dishonest tellClaiming scaling 'guarantees' AGI. Scaling laws are about loss curves, not capability ceilings.

TermRAG

Marketing meaningLets AI use your data

Technical realityRetrieval-augmented generation. Lewis et al. 2020. Fetch relevant docs, stuff into context, generate. Real systems add reranking, chunking, hybrid search.

Dishonest tell'AI trained on your data' when the system is just doing vanilla RAG at query time, no training involved.

TermFine-tune

Marketing meaningCustom model for your use case

Technical realityContinued training on a smaller, task-specific dataset. LoRA (Hu et al. 2021) is the dominant parameter-efficient approach.

Dishonest tellCalling a system prompt 'a fine-tune' to imply deeper customization than exists.

TermVector database

Marketing meaningDatabase for AI

Technical realityStorage and ANN search over embeddings. Pinecone, Weaviate, pgvector, Qdrant, Milvus, Chroma. HNSW and IVF are common index types.

Dishonest tellPositioning a vector DB as 'the AI layer' when it is one component of a retrieval pipeline.

TermSemantic search

Marketing meaningSearch by meaning, not keywords

Technical realitySearch over dense vector embeddings using cosine or dot-product similarity. Usually paired with BM25 in hybrid setups for real-world quality.

Dishonest tellPure-vector demos that fall apart on exact-match queries (SKUs, names, IDs) which BM25 would handle.

TermEmbedding

Marketing meaningAI's representation of text

Technical realityA fixed-length dense vector from a learned encoder (e.g., OpenAI text-embedding-3, Cohere, BGE, E5). Captures semantic similarity in vector space.

Dishonest tellTreating embeddings as interpretable features. They aren't, beyond similarity.

TermTransformer

Marketing meaningThe neural net behind modern AI

Technical realityArchitecture from Vaswani et al. 2017 'Attention Is All You Need.' Self-attention + feedforward layers. Decoder-only is now dominant for LLMs.

Dishonest tellSaying 'transformer-based' as a quality signal. Almost everything is. It's table stakes.

TermGPT

Marketing meaningOpenAI's models

Technical realityGenerative Pretrained Transformer. A specific OpenAI lineage (GPT-2, 3, 3.5, 4, etc.) but also used generically for decoder-only LMs.

Dishonest tellCalling a Llama or Mistral fine-tune 'a GPT.' Wrong family, wrong vendor.

TermHallucination

Marketing meaningAI making things up

Technical realityConfident but incorrect or unsupported output. Survey: Ji et al. 2023 'Survey of Hallucination in NLG.' Cannot be eliminated, only reduced.

Dishonest tellVendors claiming 'hallucination-free' systems. No such thing in 2026 for open-domain generation.

TermGround truth

Marketing meaningThe correct answer

Technical realityReference labels in a dataset used for evaluation. Quality depends entirely on the labeling process.

Dishonest tellTreating LLM-generated labels as ground truth without human spot-checks.

TermReasoning

Marketing meaningThe AI thinks

Technical realityMulti-step output that decomposes a problem. OpenAI o1/o3 and similar models trained with RL to produce long chains. No agreed mechanistic definition of 'reasoning.'

Dishonest tellAny chain-of-thought output relabeled as 'reasoning' regardless of whether the steps are valid.

TermChain-of-thought

Marketing meaningStep-by-step thinking

Technical realityPrompting technique from Wei et al. 2022. Asking the model to show intermediate steps before final answer. Improves performance on math and logic.

Dishonest tellClaiming CoT 'proves' the model reasons. The steps may be post-hoc rationalizations (Turpin et al. 2023).

TermMulti-step

Marketing meaningMore than one action

Technical realityA workflow with sequential stages, often with tool calls between them. Operationally distinct from single-shot completion.

Dishonest tellUsed interchangeably with 'agentic' and 'reasoning' to suggest sophistication.

TermDeep research

Marketing meaningAI does the research for you

Technical realitySpecific product category: long-horizon search + read + synthesize loops. OpenAI Deep Research, Google Gemini Deep Research, Perplexity Pro. Real benchmarks scarce.

Dishonest tellVendors using the phrase without specifying source coverage, citation accuracy, or how hallucinations are handled.

TermTool use

Marketing meaningAI calls APIs

Technical realityModel emits structured calls to external functions; runtime executes and returns results. Toolformer (Schick et al. 2023) was an early formulation.

Dishonest tellDemo videos that hide failure rates. Real tool-use systems have non-trivial error and retry needs.

TermFunction calling

Marketing meaningSpecific form of tool use

Technical realityStructured output (typically JSON) matching a defined schema, executed by the runtime. OpenAI introduced as 'function calling' in 2023; now industry standard.

Dishonest tellTreating function calling as a security boundary. It is not — the runtime decides what to execute.

TermMCP

Marketing meaningModel Context Protocol

Technical realityAnthropic's open spec (modelcontextprotocol.io, 2024) for connecting AI assistants to data sources and tools via standardized servers.

Dishonest tellCalling any AI integration 'MCP-based' when it's a custom API. MCP is a specific protocol.

TermAmbient AI

Marketing meaningAI that's just there

Technical realityTerm used for always-on, context-aware assistants. No standard spec; usage varies by vendor.

Dishonest tellMarketing word for 'we put the assistant in more places.' Usually means more telemetry, not more capability.

TermAmbient agents

Marketing meaningAgents that run in the background

Technical realityLangChain and others have published patterns: agents triggered by events (email, calendar) rather than user prompts. Documented in LangChain blog and OSS examples.

Dishonest tellPromising 'background work' that turns out to be scheduled cron jobs with an LLM call.

Why the AGI debate is mostly definitional

When two people argue about whether GPT-4 or Claude or Gemini is 'AGI,' they are almost always using different definitions. The Turing test, the coffee test (Wozniak), the employment test (Mitchell), the unified theory test (Goertzel), and the OpenAI charter language all measure different things. Morris et al. (2023) at DeepMind tried to fix this with a matrix: levels (emerging through superhuman) by generality (narrow vs. general). It is one of the cleaner frameworks but is not universally adopted. The practical advice for non-researchers is to refuse the term and ask three replacement questions. First: on which specific benchmarks does the system match or exceed human baselines, and what are the failure modes outside the benchmark distribution? Second: what is the system's reliability on novel tasks it was not trained on, measured how? Third: what is the cost of a wrong answer in the deployment context, and how is that bounded? These three questions sidestep the entire definitional fight and get to what actually matters for any real decision. A related move is to ask about generalization vs. memorization. A system that scores 95 percent on a benchmark whose data is in the pretraining corpus is doing something very different from a system that scores 70 percent on a held-out benchmark released after the training cutoff. The benchmark contamination literature (Sainz et al. 2023, Magar and Schwartz 2022) is the technical backstop for this question. If a vendor cannot or will not show held-out evaluation, that is informative on its own.

The alignment vocabulary, decoded

Alignment is the most-used and least-specified word in the safety conversation. Here is the actual technique stack underneath the word, with the original references where they exist.

RLHF (Reinforcement Learning from Human Feedback) — Christiano et al. 2017, Ouyang et al. 2022 (InstructGPT). Train a reward model on human preference pairs, then PPO the base model against that reward. The dominant alignment technique through 2023.
RLAIF (Reinforcement Learning from AI Feedback) — Lee et al. 2023 (Google). Use a stronger model to generate preference labels instead of humans. Cheaper, debated quality.
Constitutional AI — Bai et al. 2022 (Anthropic). Replace some or all human feedback with a written set of principles; the model critiques and revises its own outputs against the constitution.
DPO (Direct Preference Optimization) — Rafailov et al. 2023. Mathematically equivalent reformulation of RLHF that skips the explicit reward model. Now widely adopted.
Scalable oversight — research direction (debate, recursive reward modeling, weak-to-strong generalization). The bet is that humans can supervise models smarter than themselves with the right protocol. See Burns et al. 2023 (OpenAI), Irving et al. 2018.
Interpretability — separate research direction. Sparse autoencoders (Anthropic, 2024), mechanistic interpretability (Olah et al.), circuits research. Goal is to understand what a model is doing internally, not just shape its outputs.
Red teaming — adversarial testing. Now standard practice at major labs and increasingly required by policy (US Executive Order 14110 of 2023, EU AI Act high-risk obligations).

Hallucination: what it is, what it isn't, why 'eliminated' is a lie

Hallucination is not a bug to be patched. It is a property of how decoder-only language models work: they generate the next token that maximizes a learned distribution, with no built-in ground-truth check. Even a model that scores 100 percent on a benchmark can confabulate when asked something outside the benchmark distribution. The useful framing from Ji et al. 2023 distinguishes intrinsic hallucinations (contradicting the input or context) from extrinsic hallucinations (contradicting external world facts that aren't in the input). RAG addresses some intrinsic cases. Tool use and citation requirements address some extrinsic cases. Neither eliminates them. The literature is consistent that the floor is non-zero for open-domain generation. When a vendor pitches 'hallucination-free' AI, the honest interpretation is: bounded domain, narrow query distribution, post-hoc verification, or all three. Ask which. If the answer is fuzzy, the claim is marketing. Also worth knowing: ablation studies from 2023-2025 show RAG can introduce new hallucinations by including misleading retrieved passages, so 'we use RAG' is not by itself a safety story.

Tool use, function calling, MCP: a short timeline of how they differ

These three terms get conflated regularly. They are sequential layers of standardization, not synonyms.

2022 — Toolformer
Tool use as a research concept
Schick et al. (Meta AI) showed a language model could be trained to call APIs (calculator, search, translation) and use the results. Foundational paper for the modern category.
2023 — Function calling
OpenAI standardizes the schema
OpenAI launched 'function calling' in GPT-3.5/4: the developer defines a JSON schema, the model emits structured calls matching it, the developer's runtime executes. Became the industry pattern; Anthropic, Google, Mistral followed with their own implementations.
2024 — MCP launched
Anthropic ships an open protocol
Model Context Protocol (modelcontextprotocol.io) is an open spec for how AI assistants connect to data sources and tools via standardized servers. Reduces the M×N integration problem (M assistants, N data sources) to M+N.
2025-2026 — Ambient agents emerge
Event-driven agent loops
Tool use plus background triggers (email, webhook, calendar event, schedule). LangChain, LangGraph, and others published patterns; multiple vendors now ship 'ambient' or 'background' agent products. The category is real but the marketing is ahead of the reliability data.

Three rules of thumb when reading any AI claim

Internal rubric for parsing marketing material from press releases, vendor blogs, and pitch decks. Sentence-case, no jargon, works for anyone non-technical.

Name the benchmark or shut up

Rule 1

If a claim does not name a specific public benchmark with a specific score, treat it as marketing until proven otherwise. Real benchmarks: MMLU, HumanEval, GPQA, SWE-bench, ARC-AGI, BIG-Bench. Vague 'state-of-the-art' or 'best-in-class' phrases are not benchmark results. Held-out evaluation matters more than headline numbers (see benchmark contamination literature).

Ask 'compared to what'

Rule 2

Any percentage improvement is meaningless without a named baseline and confidence interval. '90 percent accurate' tells you nothing if you don't know what the previous accuracy was, what humans score, what random would score, and how the test set was constructed. The Schaeffer et al. 2023 critique of emergent abilities is a clean example of how baseline choice changes the story entirely.

Demand failure modes

Rule 3

Any honest AI vendor will name the failure modes of their system, the rate at which they occur, and how the runtime handles them. If the answer is 'we don't fail' or 'edge cases are rare,' that is the dishonest tell. Anthropic, OpenAI, and Google all publish model cards documenting known failures. The presence or absence of a model card is itself a signal.

Terms we left out, and why

This page focused on terms that show up in pitches, board decks, and procurement. We deliberately skipped several adjacent categories. Inference-time compute (mostly used inside research, not marketing) is covered well by the OpenAI o1 system card and follow-up papers. Mixture-of-experts (MoE) is an architecture detail rather than a marketing term, though Mistral's Mixtral and Google's switch-transformer work are public references. World models (a Yann LeCun term, distinct from generative video models) and JEPA (Joint Embedding Predictive Architecture, also LeCun) are research positions, not products you can buy today. We also left out the safety vocabulary (catastrophic risk, x-risk, AI safety, dangerous capabilities) because that is its own glossary and would dilute the practical focus here. The UK AI Safety Institute, the US AI Safety Institute (NIST), and the EU AI Act use overlapping but non-identical definitions; if you are working in a regulated context, read those primary sources rather than any vendor summary. When in doubt, the single best move is to read the original paper. Almost every term on this page traces to a specific arXiv paper or a specific company blog post. The originals are usually clearer than the press coverage. If a term has no original source, that itself is informative.

Sources

[01]
Vaswani et al. 2017 'Attention Is All You Need' introduced the transformer architecture that underlies almost all modern LLMs.
arxiv.org/abs/1706.03762
[02]
Brown et al. 2020 'Language Models are Few-Shot Learners' (GPT-3 paper) established the scaling-via-prompting paradigm.
arxiv.org/abs/2005.14165
[03]
Kaplan et al. 2020 established empirical scaling laws for neural language models.
arxiv.org/abs/2001.08361
[04]
Hoffmann et al. 2022 (Chinchilla) showed that prior LLMs were undertrained relative to compute-optimal scaling, revising Kaplan's coefficients.
arxiv.org/abs/2203.15556
[05]
Wei et al. 2022 'Emergent Abilities of Large Language Models' is the foundational paper on emergence claims.
arxiv.org/abs/2206.07682
[06]
Schaeffer et al. 2023 'Are Emergent Abilities of Large Language Models a Mirage?' (NeurIPS 2023 best paper) showed many emergence claims were metric artifacts.
arxiv.org/abs/2304.15004
[07]
Wei et al. 2022 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models' introduced the canonical CoT prompting technique.
arxiv.org/abs/2201.11903
[08]
Yao et al. 2022 'ReAct: Synergizing Reasoning and Acting in Language Models' established the standard reasoning-and-action agent pattern.
arxiv.org/abs/2210.03629
[09]
Lewis et al. 2020 introduced retrieval-augmented generation (RAG) for knowledge-intensive NLP tasks.
arxiv.org/abs/2005.11401
[10]
Hu et al. 2021 'LoRA: Low-Rank Adaptation of Large Language Models' is the dominant parameter-efficient fine-tuning method.
arxiv.org/abs/2106.09685
[11]
Ouyang et al. 2022 (InstructGPT) is the canonical reference for RLHF as deployed in production language models.
arxiv.org/abs/2203.02155
[12]
Christiano et al. 2017 introduced the foundational RLHF method using human preference comparisons.
arxiv.org/abs/1706.03741
[13]
Bai et al. 2022 'Constitutional AI: Harmlessness from AI Feedback' (Anthropic) introduced the CAI training approach.
arxiv.org/abs/2212.08073
[14]
Rafailov et al. 2023 'Direct Preference Optimization' provides a reward-model-free reformulation of RLHF that is now widely adopted.
arxiv.org/abs/2305.18290
[15]
Schick et al. 2023 'Toolformer' was the early formulation of language models learning to call external APIs.
arxiv.org/abs/2302.04761
[16]
Ji et al. 2023 'Survey of Hallucination in Natural Language Generation' is the canonical hallucination taxonomy reference.
arxiv.org/abs/2202.03629
[17]
Turpin et al. 2023 showed that chain-of-thought explanations can be post-hoc rationalizations not reflecting the model's actual reasoning.
arxiv.org/abs/2305.04388
[18]
Sainz et al. 2023 documented benchmark contamination in LLM evaluations, motivating held-out testing.
arxiv.org/abs/2310.16787
[19]
Morris et al. 2023 (DeepMind) 'Levels of AGI' proposed a generality-by-performance framework to operationalize AGI discussions.
arxiv.org/abs/2311.02462
[20]
Bommasani et al. 2021 (Stanford CRFM) introduced the term 'foundation model' for large pretrained models adaptable to many tasks.
arxiv.org/abs/2108.07258
[21]
Model Context Protocol is Anthropic's open specification for connecting AI assistants to tools and data sources via standardized servers, launched November 2024.
modelcontextprotocol.io
[22]
OpenAI's 2023 'Planning for AGI and beyond' essay defines AGI loosely as AI systems generally smarter than humans, without operational test criteria.
openai.com/index/planning-for-agi-and-beyond
[23]
Anthropic's mechanistic interpretability publications including sparse autoencoder work on Claude 3 Sonnet are the primary public reference for circuit-level interpretability research.
transformer-circuits.pub
[24]
US Executive Order 14110 made red-teaming and safety reporting requirements for frontier AI developers a regulatory expectation in the US.
whitehouse.gov · Executive Order 14110 (October 2023)
[25]
Bostrom's 2014 book formalized the distinction between speed, collective, and quality superintelligence used in subsequent ASI discussions.
Nick Bostrom · Superintelligence (Oxford University Press, 2014)

Keep reading

Learn: AI fundamentals →Research: ÆoNs papers →Decoder: model comparisons →Tools: prompt and audit kit →OrangeBox: local AI runtime →B00KMakor: long-form authoring →Vs: vendor comparisons →Learn: playbooks index →