AtomEons / Research / Decoded / Latent Diffusion

2022 · arXiv:2112.10752 · Rombach, Blattmann, Lorenz, Esser, Ommer · LMU Munich + Heidelberg + Runway

AI art on a laptop GPU.

In one sentence: German researchers figured out how to compress the image-generation problem into a 64× smaller mathematical space, making AI image generation tractable on consumer hardware — and launching the open-source generative art era.

01 · Why this matters to your life

Every AI image you have seen on the internet since mid-2022 descends from this paper. Stable Diffusion (released August 2022) was the open-source implementation. Flux, the current open-weights state of the art, is a flow-matching extension of the same idea. DALL-E 3, Imagen, Nano Banana Pro — all use the latent-diffusion architecture this paper introduced.

The atomeons.com hero photography was generated by Nano Banana Pro, which is a member of this family. The reason image generation is cheap enough to do at scale — a few cents per image instead of dollars — is the trick this paper introduced.

02 · What scientists actually did

Diffusion models work by starting with random noise and gradually denoising it into an image. The denoising process is run by a neural network that has been trained to predict what noise should be removed at each step. By 2021 this worked beautifully — Ho et al.'s 2020 DDPM paper was the foundational version — but it was expensive. Each denoising step required running the network at full image resolution. For a 512×512 image, that's 262,144 pixels processed at every step, for hundreds of steps.

The Rombach team's insight: don't do the diffusion in pixel space. Do it in a much smaller compressed space — a 64×64 latent representation encoded from the original image by a separate neural network called a VAE (Variational Autoencoder). All the heavy denoising work happens in the small space. The final latent is then decoded back to a full-resolution image by the VAE's decoder.

The compression ratio is about 48×. That makes everything ~48× cheaper. Suddenly you can train a high-quality model in weeks instead of months, and generate images in seconds on a consumer GPU instead of minutes on a server. The economics flipped from research-lab-only to anyone-can-run-it.

The other contribution: they showed how to condition the model on text prompts using a pre-trained CLIP text encoder. The result was a model that could take any English description and generate an image matching it. That conditioning is what made the technology consumer-facing.

03 · What scientists know but rarely say

The Stability AI release of Stable Diffusion in August 2022 — six months after this paper — was the moment generative art became a real cultural and economic force. The open-weights release meant anyone could run the model. Artists immediately protested that their work had been used as training data without consent. The Getty Images vs. Stability AI lawsuit (filed January 2023, ongoing in 2026) is the high-profile downstream of this paper.

The technical truth that the paper soft-pedals: latent diffusion is not the ideal mathematical formulation. Subsequent work — particularly flow-matching (Lipman et al. 2022, picked up by Stable Diffusion 3 and Flux) — has produced cleaner objectives. But latent diffusion was good enough to start the era, and most production systems still use variants of the original recipe.

The thing nobody in image generation likes to say: deeper, persistent failure modes — hands with 6 fingers, faces with uncanny-valley asymmetry, text that says nothing in particular — are baked into how diffusion models reason about images. Each successor model patches some of these but rarely solves them at the level a careful viewer notices. The technology produces stunning images and inscrutable failures from the same architecture.

04 · What the paper does NOT claim

The paper does not claim its model is the best image generator. It claims latent diffusion is dramatically more efficient than pixel-space diffusion at comparable quality. The quality bar in the paper was state-of-the-art at the time but has been substantially exceeded by every follow-up model. The paper's contribution is the architecture, not the absolute output quality.

The paper also does not address copyright, training-data consent, or the cultural impact of releasing image generation as an open-weights product. Those debates emerged downstream of the Stability AI release. The 2022 paper is a purely technical contribution; the 2022-2026 cultural reckoning is its consequence.

05 · Read the original

· arxiv.org/abs/2112.10752 — the original Latent Diffusion paper.
· Ho, Jain, Abbeel 2020 (DDPM) — the foundational diffusion-in-pixel-space paper this work builds on. arxiv:2006.11239.
· Lipman et al. 2022 (Flow Matching) — the cleaner mathematical successor used in Stable Diffusion 3 and Flux. arxiv:2210.02747.
· Stable Diffusion release notes (August 22, 2022) — the Stability AI launch of the public model.
· Then see our /learn/atlas/diffusion page for the full 2026 diffusion ecosystem.

← decoded index