
AI acronyms decoded
200+ terms with plain-language definitions and when you'd actually use them
How to read this page
- Use Ctrl-F. The categories are stable, the acronyms are sorted within each table, and the language is plain enough that search works.
- Categories are: Core AI · Training & alignment · Parameter-efficient · Inference & reasoning · Retrieval & memory · Architectures · Vision & multimodal · Evaluation & benchmarks · Loss & metrics · Agents & protocols · Compliance & regulation · Infrastructure & deployment.
- When you see 'industry term — no canonical paper,' that means the acronym is widely used but has no single foundational reference; it lives in vendor docs and practitioner usage, not in one citable paper.
- Dollar amounts and dates labeled 'as of June 2026 best-effort' are time-stamped on purpose — pricing and model lineups move fast, so verify against the provider's live docs before quoting them anywhere that matters.
- Pairs that confuse people (NLU vs NLG, AUC vs AUROC, MSE vs RMSE, RLHF vs DPO) are deliberately defined adjacent to each other in the tables so the distinction is visible at a glance.
- If a term you need isn't here, the omission is honest: we'd rather not include it than guess. Email a.mccree@gmail.com and we'll add it with a real citation.
Core AI — the foundational vocabulary
These are the terms that show up in nearly every AI paper, product page, and earnings call. If you only learn 20 acronyms, learn these.
Training and alignment — how models learn what we want
These acronyms describe the methods used to take a base model and shape it into something useful and safe. The space moved fast between 2022 and 2025; RLHF was the dominant approach, then DPO simplified it, then several variants competed for the title of 'best preference-tuning method.'
Parameter-efficient fine-tuning
When you can't afford to fine-tune all the weights of a 70B-parameter model, these are the methods that let you train a small fraction of parameters and still get most of the benefit. As of June 2026 best-effort, LoRA and QLoRA are the workhorses for indie and research use; full fine-tuning is reserved for labs with serious GPU budgets.
Inference, reasoning, and prompting
What models do at the moment you ask them a question. This corner of the field exploded in late 2024 and 2025 around 'test-time compute' — the idea that spending more inference cycles on a hard problem can substitute for a larger model.
Retrieval, memory, and grounding
How models access information they don't have in their weights — documents, databases, prior turns. Critical for any product where current facts, proprietary data, or citation requirements matter.
Architectures, vision, and multimodal
The architectural building blocks. Transformers dominate, but the older shapes (CNN, RNN, LSTM) still ship in production and you'll see them referenced in legacy systems and edge deployments.
Evaluation and benchmarks
The yardsticks. Benchmarks are imperfect — they get gamed, leaked into training data, or stop discriminating between top models. Always check the leaderboard's contamination notes and the model's training-cutoff date before drawing conclusions. As of June 2026 best-effort, the live benchmarks worth watching are LiveBench, SWE-Bench Verified, and GPQA Diamond; older saturated ones (MMLU, HumanEval base) still appear in marketing but tell you less.
MMLU
Hendrycks et al. 2020 · arxiv.org/abs/2009.03300
Massive multitask language understanding — 57 academic subjects, multiple choice. Saturated near the top; useful as a sanity floor.
MMLU-Pro
Wang et al. 2024 · arxiv.org/abs/2406.01574
Harder MMLU successor with more answer choices and more reasoning. Less saturated as of 2026.
GPQA
Rein et al. 2023 · arxiv.org/abs/2311.12022
Graduate-level Google-proof Q&A in biology, physics, chemistry. The 'Diamond' subset is the hard one.
HumanEval
Chen et al. 2021 (Codex) · arxiv.org/abs/2107.03374
164 Python programming problems with unit tests. The original code-completion benchmark; now saturated.
MBPP
Austin et al. 2021 · arxiv.org/abs/2108.07732
Mostly Basic Python Problems — 974 entry-level coding tasks paired with HumanEval.
SWE-Bench
Jimenez et al. 2023 · arxiv.org/abs/2310.06770 · swebench.com
Real GitHub issues from popular Python repos. 'Verified' subset is hand-curated by humans for correctness.
LiveBench
White et al. 2024 · arxiv.org/abs/2406.19314 · livebench.ai
Continuously refreshed benchmark to resist contamination. Updated monthly.
BIG-Bench
Srivastava et al. 2022 · arxiv.org/abs/2206.04615
200+ task collaborative benchmark from Google. BIG-Bench Hard (BBH) is the still-useful subset.
ARC-AGI
Chollet 2019 · arxiv.org/abs/1911.01547 · arcprize.org
François Chollet's abstract-reasoning benchmark designed to resist memorization. ARC Prize is the live competition.
HELM
Liang et al. 2022 · arxiv.org/abs/2211.09110 · crfm.stanford.edu/helm
Holistic evaluation of language models — Stanford CRFM's broad multi-axis benchmark suite.
Chatbot Arena
Chiang et al. 2024 · arxiv.org/abs/2403.04132 · lmarena.ai
ELO-style human preference leaderboard from LMSYS. Now LMArena; reflects user preference, not capability per se.
Pass@k
Defined in Codex paper · arxiv.org/abs/2107.03374
Metric for code generation: probability that at least one of k sampled completions passes all tests.
Loss functions, metrics, and statistics
The math symbols hiding behind the acronyms. Most papers will use at least three of these; product-side readers can skim, ML readers need them at instinct level.
Agents, protocols, and tool use
The 2024-2026 wave of acronyms around models that take actions. MCP is the one solo developers run into first; A2A, BAA-style protocols, and orchestration frameworks come next.
Compliance, regulation, and legal — the alphabet soup that gates B2B deals
These are the acronyms that appear in vendor questionnaires and procurement reviews. None of them are AI-specific; all of them apply to AI systems. As a solo founder or small team, you will discover them the first time an enterprise buyer sends a 200-row spreadsheet. Verify current requirements against the official primary source — regulations are updated more often than this page.
A warning about acronyms that look stable but aren't
Three honest cautions before you trust any acronym list. First, expansions drift: 'AI agent' in 2022 meant something different than 'AI agent' in 2026, and the same will be true for 2028. Second, ownership matters: MCP is Anthropic's, A2A is Google's, and the political economy of which protocol wins is not separable from which company is pushing it. Third, benchmark acronyms decay fastest of all — by the time MMLU was a household term among AI buyers, it was already saturated for frontier models, and anyone still selling on MMLU scores in 2026 is selling you something else. The acronyms in this page that have the longest shelf life are the ones tied to regulations (GDPR, HIPAA, SOC 2) and to fundamental math (MSE, KL, F1). The ones tied to specific architectures, methods, and benchmarks rotate on a roughly two-year cycle. Use this page as a decoder, not as a stable reference — and when you cite a number, cite the live source, not the acronym.
Sources
- [01]
Vaswani et al. 2017 'Attention Is All You Need' introduces the transformer architecture that underlies modern LLMs.
arxiv.org/abs/1706.03762
- [02]
Christiano et al. 2017 established the foundational RLHF technique of training reward models from human preference comparisons.
arxiv.org/abs/1706.03741
- [03]
Ouyang et al. 2022 (InstructGPT) applied RLHF to instruction-following LLMs and became the template for production fine-tuning.
arxiv.org/abs/2203.02155
- [04]
Rafailov et al. 2023 introduced Direct Preference Optimization (DPO) as a closed-form alternative to RLHF.
arxiv.org/abs/2305.18290
- [05]
Ethayarajh et al. 2024 introduced KTO using prospect-theory-style utility for preference tuning.
arxiv.org/abs/2402.01306
- [06]
Hong et al. 2024 introduced ORPO, combining SFT and preference optimization in a single step.
arxiv.org/abs/2403.07691
- [07]
Shao et al. 2024 (DeepSeekMath) introduced GRPO, the group-relative policy optimization method later used by DeepSeek reasoning models.
arxiv.org/abs/2402.03300
- [08]
Schulman et al. 2017 introduced PPO, the policy gradient method widely used in RLHF.
arxiv.org/abs/1707.06347
- [09]
Bai et al. 2022 introduced Anthropic's Constitutional AI method using AI feedback against a written constitution.
arxiv.org/abs/2212.08073
- [10]
Hu et al. 2021 introduced LoRA, the low-rank adaptation method that became the standard for efficient fine-tuning.
arxiv.org/abs/2106.09685
- [11]
Dettmers et al. 2023 introduced QLoRA, combining 4-bit quantization with LoRA.
arxiv.org/abs/2305.14314
- [12]
Wei et al. 2022 introduced chain-of-thought prompting, demonstrating that reasoning steps improve LLM performance on complex tasks.
arxiv.org/abs/2201.11903
- [13]
Brown et al. 2020 (GPT-3 paper) introduced in-context learning as an emergent capability of large language models.
arxiv.org/abs/2005.14165
- [14]
Snell et al. 2024 formalized the test-time compute scaling regime and its tradeoffs against model size.
arxiv.org/abs/2408.03314
- [15]
Shazeer et al. 2017 introduced the sparsely-gated mixture-of-experts layer that underpins modern MoE architectures.
arxiv.org/abs/1701.06538
- [16]
Lewis et al. 2020 introduced retrieval-augmented generation (RAG) combining parametric and non-parametric memory.
arxiv.org/abs/2005.11401
- [17]
Radford et al. 2021 introduced CLIP, the contrastive image-text model that powers most modern multimodal systems.
arxiv.org/abs/2103.00020
- [18]
Dosovitskiy et al. 2020 introduced the Vision Transformer (ViT) applying transformers to image patches.
arxiv.org/abs/2010.11929
- [19]
Goodfellow et al. 2014 introduced generative adversarial networks (GANs).
arxiv.org/abs/1406.2661
- [20]
Hendrycks et al. 2020 introduced MMLU, the multitask language understanding benchmark covering 57 academic subjects.
arxiv.org/abs/2009.03300
- [21]
Jimenez et al. 2023 introduced SWE-Bench, evaluating LLMs on real GitHub issues from popular Python repos.
arxiv.org/abs/2310.06770
- [22]
Model Context Protocol (MCP) is Anthropic's open standard for connecting AI models to data sources and tools, announced November 2024.
modelcontextprotocol.io
- [23]
Regulation (EU) 2016/679 is the official text of the General Data Protection Regulation (GDPR).
eur-lex.europa.eu/eli/reg/2016/679/oj
- [24]
The US Department of Health and Human Services is the authoritative source for HIPAA regulations and the Business Associate Agreement requirements.
hhs.gov/hipaa
- [25]
NIST published the AI Risk Management Framework 1.0 in January 2023 as a voluntary US framework for AI risk.
nist.gov/itl/ai-risk-management-framework