AtomEons / Research / Decoded / Scaling Laws

2020 · arXiv:2001.08361 · Kaplan, McCandlish, Henighan, Brown, Chess, Child, Gray, Radford, Wu, Amodei

Bigger AI keeps getting smarter.

In one sentence: OpenAI ran the experiment of making language models progressively larger and discovered that performance improves in a predictable, mathematical way — and that nothing in the data suggested where it stopped.

01 · Why this matters to your life

This is the paper that justified the past five years of trillion-dollar AI investment. Before this paper, building bigger AI was a guess. After this paper, building bigger AI was a forecastable engineering problem — you could plot the curve and know how much smarter you would get from how much money.

That forecast is why Microsoft put $13B into OpenAI. Why Google built TPU farms the size of small cities. Why Nvidia is one of the most valuable companies on earth. Why the entire AI industry spent ~$500B in five years on more GPUs. This 2020 paper made the bet calculable. The bet keeps paying.

02 · What scientists actually did

They trained language models at many different sizes — from tiny (a few thousand parameters) to gigantic (1.5 billion parameters, large for 2020). For each size, they measured how well the model predicted the next word in text it had not seen before — the standard test of language understanding.

When they plotted the results, the improvement followed a power law — a specific mathematical shape that says “every doubling of size produces the same percentage improvement.” The curve was smooth, predictable, and showed no sign of bending toward an upper limit within the range they could test.

They also discovered which input matters most. There are three things you can scale: model size (more parameters), data size (more training text), and compute (more GPU-hours). The paper found that all three matter together, but model size is the dominant driver if the others are kept in proper proportion. The 2022 follow-up paper from DeepMind (Chinchilla) refined the ratios, but the core finding held.

03 · What scientists know but rarely say

The power law is empirical, not theoretical. Nobody knows why the curve has this shape. Nobody knows where it bends. Nobody can prove from physics or information theory that the trend continues. We have extrapolated a graph for five years and the graph has not broken. That is all.

The most consequential unstated implication: if this trend continues for another decade, the resulting AI systems are likely to be qualitatively different from anything we have today. The lead authors of this paper believed that in 2020 and based subsequent careers on it. Sam Altman has been forecasting from this paper for five years. Dario Amodei left OpenAI to found Anthropic specifically because he took this paper seriously. The trillion-dollar valuations of frontier AI labs are extrapolations of this single curve.

The other unstated truth: scaling laws say nothing about safety, alignment, or what the model will choose to do. They guarantee performance improvements. They do not guarantee good behavior. This is why every frontier lab now has a safety team — performance scales with capital, behavior does not.

04 · What the paper does NOT claim

The paper does not claim the trend will continue forever. It does not claim infinite intelligence is reachable by infinite spend. It does not claim AGI. It claims that within the range tested (millions to billions of parameters), the improvement is a clean power law — and that the extrapolation is “suggestive”, not proven.

The 2024-2026 industry consensus is that the simple scaling-only era is reaching practical limits — training a model 100× larger than GPT-4 would cost more than any company has, and the marginal returns may not justify the spend. So the field has pivoted to scaling inference-time reasoning (o1, o3, Claude Extended Thinking) and post-training quality (RLHF, Constitutional AI) instead of just scaling the base model. The original paper does not anticipate this — it is the next chapter of the story.

05 · Read the original

· arxiv.org/abs/2001.08361 — the original. ~30 pages but the figures tell the whole story.
· Hoffmann et al. 2022 (Chinchilla) — DeepMind's follow-up that revised the optimal compute-vs-data ratio. arXiv:2203.15556.
· Henighan et al. 2020 — same OpenAI group's follow-up showing scaling laws hold for images + audio too. arXiv:2010.14701.
· Then read chain-of-thought for what happened when scaling alone stopped being enough.

Reasoning unlocked →← decoded index