::AI videos · 50 vetted · zero hype

The AI videos that actually teach you something.

Click-to-load embeds · no YouTube JS until you press play. Curated by category and skill level. Real creators we've watched and learned from. Zero affiliate revenue.

foundations transformers interpretability practical workflows safety history capability evals

foundations

But what is a neural network? | Chapter 1, Deep learning

3Blue1Brown · 19min · novice

Grant Sanderson's visual derivation of what a neural network actually is, starting from MNIST digit recognition. The canonical entry point that builds the geometric intuition everything else rests on.

Gradient descent, how neural networks learn | Chapter 2, Deep learning

3Blue1Brown · 21min · novice

Shows what learning actually means mathematically, minimizing a loss function by walking downhill. Removes the magic from gradient descent without dumbing it down.

What is backpropagation really doing? | Chapter 3, Deep learning

3Blue1Brown · 13min · learner

Intuition layer for backprop before the calculus. Most people skip this and never recover. Watch it before chapter 4.

Backpropagation calculus | Chapter 4, Deep learning

3Blue1Brown · 10min · learner

The chain rule applied to neural nets, shown step by step. Short, dense, and the difference between knowing the word backprop and understanding it.

The spelled-out intro to neural networks and backpropagation: building micrograd

Andrej Karpathy · 145min · learner

Karpathy builds an autograd engine from scratch in a Jupyter notebook. After this, you actually understand what PyTorch is doing under the hood. Non-optional for serious learners.

The spelled-out intro to language modeling: building makemore

Andrej Karpathy · 117min · learner

Bigram models to neural language models, built live. The makemore series is the closest thing to a doctorate-grade language-model bootcamp on the open internet.

[1hr Talk] Intro to Large Language Models

Andrej Karpathy · 60min · user

Karpathy's single best high-altitude survey of LLMs as systems: pretraining, finetuning, RLHF, agents, security. The talk to send a smart non-ML person.

Deep Dive into LLMs like ChatGPT

Andrej Karpathy · 210min · user

Three-and-a-half hour comprehensive walkthrough of how ChatGPT actually works: pretraining, SFT, RLHF, hallucinations, tools. Karpathy's 2025 follow-up to his earlier intro.

Imaginary Numbers Are Real [Part 1: Introduction]

Welch Labs · 6min · novice

Not AI directly, but the math intuition Welch builds (geometric interpretation of abstract objects) is the same muscle you need for embeddings, attention, and latent spaces.

Neural Networks Pt. 1: Inside the Black Box

StatQuest with Josh Starmer · 19min · novice

Starmer's gentlest possible introduction to neural networks. Pairs well with 3Blue1Brown for learners who need it twice from two angles.

Neural Networks Pt. 2: Backpropagation Main Ideas

StatQuest with Josh Starmer · 17min · novice

Backpropagation explained without calculus first, then with. Starmer's BAM pedagogy is unironically effective.

Stanford CS229 Lecture 2 - Linear Regression and Gradient Descent

Stanford Online · 80min · learner

Andrew Ng's Stanford CS229 lecture on the foundations. Doctorate-grade, free, the bedrock under everything LLM-related.

MIT 6.S191: Introduction to Deep Learning

Alexander Amini · 60min · learner

MIT's annually-updated deep-learning intro. Cleaner narrative arc than CS229 for someone who wants the modern view, with current code.

transformers

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

3Blue1Brown · 27min · learner

Best visual introduction to transformer architecture in existence. Tokens, embeddings, the residual stream, attention, drawn so the geometry sticks.

Attention in transformers, visually explained | Chapter 6, Deep Learning

3Blue1Brown · 26min · learner

Query, key, value matrices shown as actual matrices doing actual things. Removes the mystery of attention is all you need in 26 minutes.

Let's build GPT: from scratch, in code, spelled out.

Andrej Karpathy · 116min · user

Karpathy implements nanoGPT, a working GPT, from blank file to trained model in under two hours. The single most-cited I finally got it video for transformers.

Let's build the GPT Tokenizer

Andrej Karpathy · 134min · user

Byte-pair encoding from scratch. Most engineers ship LLM apps without understanding tokenization, which is why their prompts behave weird. Karpathy fixes that.

Let's reproduce GPT-2 (124M)

Andrej Karpathy · 240min · operator

Four-hour live build reproducing GPT-2 from scratch, including data, training loop, and optimizer details. The closest thing to industrial-strength LLM training pedagogy for free.

State of GPT | BRK216HFS

Microsoft Developer · 42min · user

Karpathy at Microsoft Build 2023. The cleanest one-shot diagram of pretraining to SFT to reward model to RL still in active use as a mental model.

Attention Is All You Need

Yannic Kilcher · 28min · user

Kilcher's walkthrough of the original 2017 transformer paper. Best read-the-paper-with-someone-smart treatment. Pairs well with the 3Blue1Brown visualization.

GPT-3: Language Models are Few-Shot Learners (Paper Explained)

Yannic Kilcher · 65min · user

The paper that made the world pay attention. Kilcher dissects the scaling claims, the few-shot prompting trick, and the implications, recorded in 2020 before the hype cycle hit.

Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy

Stanford Online · 75min · user

Karpathy lecturing Stanford on transformers as a general-purpose differentiable computer. One of the best high-altitude framings of the architecture.

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Grant Sanderson · 60min · user

Grant Sanderson distills his entire deep-learning chapter series into one talk. The clearest single hour on transformer geometry available.

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Umar Jamil · 110min · operator

Umar Jamil's paper-by-paper code walkthroughs are doctorate-grade. Mamba/State Space Models are the most serious post-transformer architecture. The deepest free explainer.

What are Transformer Models and How do they Work?

AI Coffee Break with Letitia · 13min · learner

Letitia Parcalabescu's accessible 13-minute transformer overview. Right altitude for someone who wants the concept before committing to Karpathy's 2-hour build.

interpretability

How might LLMs store facts | Chapter 7, Deep Learning

3Blue1Brown · 22min · user

MLP layers as key-value memory stores. Bridges to mechanistic interpretability, where Anthropic and DeepMind research is actually pointing.

Mechanistic Interpretability with Neel Nanda

Machine Learning Street Talk · 90min · operator

Neel Nanda (DeepMind interpretability lead) on how to reverse-engineer what's actually happening inside transformer layers. The field's current frontier.

practical workflows

How I use LLMs

Andrej Karpathy · 130min · user

The practitioner's companion to Deep Dive into LLMs. Karpathy demonstrates his actual day-to-day LLM workflow: coding, research, audio, vision. Counter-programming against prompt-engineering hype.

GPT-4 - How does it work, and how do I build apps with it? - CS50 Tech Talk

CS50 · 90min · user

Harvard CS50 tech talk on building with GPT-4. Practical, grounded, and at the right altitude for engineers crossing into LLM apps.

RAG vs. Fine-Tuning

IBM Technology · 8min · user

Cedric Clyburn (IBM) cleanly delineating when to RAG vs. when to fine-tune. The single decision most LLM-app builders get wrong; this is the sober answer.

What is Retrieval-Augmented Generation (RAG)?

IBM Technology · 6min · novice

Marina Danilevsky's whiteboard explainer for RAG. Six minutes, zero hype, and the right mental model before anyone touches LangChain or a vector DB.

safety

The Orthogonality Thesis

Robert Miles AI Safety · 12min · learner

Why intelligence and goals are independent axes. Miles explains the foundational result that smart does not imply aligned with us. Short, rigorous, canonical.

Why Would AI Want to do Bad Things? Instrumental Convergence

Robert Miles AI Safety · 11min · learner

Why most goals converge on sub-goals like self-preservation, resource acquisition, and goal-preservation. Once you see this, AI risk arguments stop feeling like sci-fi.

Intro to AI Safety, Remastered

Robert Miles AI Safety · 18min · novice

Best single what-is-AI-safety-as-a-field video. Miles rebuilt his original intro with cleaner argumentation. Send this to skeptics.

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

Robert Miles AI Safety · 23min · user

The inner-alignment problem made accessible. This is the issue that keeps interpretability researchers up at night. A model can be trained on the right objective and still learn the wrong one internally.

AI 'Stop Button' Problem - Computerphile

Computerphile · 20min · novice

Robert Miles on Computerphile explaining why making an off-switch for a sufficiently capable AI is harder than it looks. Classic AI-safety pedagogy.

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Robert Miles AI Safety · 15min · operator

Goes one level deeper into mesa-optimization and deceptive alignment. Not entry-level, but the canonical accessible treatment of a result Anthropic and Redwood are actively investigating.

Reward Hacking: Concrete Problems in AI Safety Part 4

Robert Miles AI Safety · 9min · learner

Why models find loopholes in their reward functions. Reward hacking is no longer theoretical, it's a daily occurrence in frontier RLHF. The canonical explainer.

Constitutional AI: Harmlessness from AI Feedback (Anthropic Paper Explained)

Yannic Kilcher · 41min · operator

Walkthrough of Anthropic's Constitutional AI paper, the technique behind Claude's training. Critical for understanding how RLAIF differs from RLHF.

Dario Amodei: Anthropic CEO on Claude, AGI and the Future of AI and Humanity

Lex Fridman Podcast · 300min · user

Five-hour conversation with Anthropic's CEO covering Claude's training, scaling laws, mechanistic interpretability, and the existential argument. The single best public source on how Anthropic actually thinks.

history

AlphaGo - The Movie | Full award-winning documentary

DeepMind · 90min · novice

Not technical, but historically essential. The Lee Sedol match was the moment the field stopped being theoretical. Move 37 is referenced in alignment papers to this day.

Geoffrey Hinton: The Foundations of Deep Learning

Lex Fridman Podcast · 90min · user

Hinton on backprop, Boltzmann machines, and his pivot to AI-risk work. The most important living source on where deep learning actually came from.

Ilya Sutskever: Deep Learning

Lex Fridman Podcast · 67min · user

Sutskever (then OpenAI Chief Scientist) on the scaling hypothesis, compression as intelligence, and the road from AlexNet to GPT. Recorded 2020. Read it against what came after.

Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning

Lex Fridman Podcast · 90min · user

LeCun argues against the LLM-only path and for self-supervised world models. The most rigorous public dissent against the current paradigm from a Turing-award winner.

Demis Hassabis: DeepMind - AI, Superintelligence and the Future of Humanity

Lex Fridman Podcast · 130min · user

Hassabis on AlphaGo, AlphaFold, and the path to AGI from a neuroscience background. Counterpoint to the LLM-centric narrative.

Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

Lex Fridman Podcast · 210min · user

Karpathy on his Tesla years, the transition to LLMs, and his framework for evaluating AI progress. Rich technical detail for builders.

ChatGPT: 30 Year History | How AI Learned to Talk

Art of the Problem · 26min · novice

Clean visual history of language modeling from n-grams to ChatGPT. Great primer to send before showing someone Karpathy.

How AIs, like ChatGPT, Learn

CGP Grey · 8min · novice

Pre-LLM (2017) but the bot-teacher framing of supervised learning aged beautifully. Send this first to anyone who's never thought about how training works.

How AI Could Empower Any Business | Andrew Ng | TED

TED · 13min · novice

Andrew Ng's 2022 TED talk on democratizing AI. Useful for non-technical operators who need a grounded reframe of what AI does for ordinary businesses.

capability evals

Sparks of AGI: early experiments with GPT-4

Sebastien Bubeck · 55min · user

Microsoft Research talk presenting the Sparks of AGI paper findings. Whether you agree with the framing or not, it's load-bearing in the discourse. Required watching for the debate.

← back to /learn