Click-to-load embeds · no YouTube JS until you press play. Curated by category and skill level. Real creators we've watched and learned from. Zero affiliate revenue.
But what is a neural network? | Chapter 1, Deep learning
3Blue1Brown · 19min · novice
Grant Sanderson's visual derivation of what a neural network actually is, starting from MNIST digit recognition. The canonical entry point that builds the geometric intuition everything else rests on.
Gradient descent, how neural networks learn | Chapter 2, Deep learning
3Blue1Brown · 21min · novice
Shows what learning actually means mathematically, minimizing a loss function by walking downhill. Removes the magic from gradient descent without dumbing it down.
What is backpropagation really doing? | Chapter 3, Deep learning
3Blue1Brown · 13min · learner
Intuition layer for backprop before the calculus. Most people skip this and never recover. Watch it before chapter 4.
Backpropagation calculus | Chapter 4, Deep learning
3Blue1Brown · 10min · learner
The chain rule applied to neural nets, shown step by step. Short, dense, and the difference between knowing the word backprop and understanding it.
The spelled-out intro to neural networks and backpropagation: building micrograd
Andrej Karpathy · 145min · learner
Karpathy builds an autograd engine from scratch in a Jupyter notebook. After this, you actually understand what PyTorch is doing under the hood. Non-optional for serious learners.
The spelled-out intro to language modeling: building makemore
Andrej Karpathy · 117min · learner
Bigram models to neural language models, built live. The makemore series is the closest thing to a doctorate-grade language-model bootcamp on the open internet.
[1hr Talk] Intro to Large Language Models
Andrej Karpathy · 60min · user
Karpathy's single best high-altitude survey of LLMs as systems: pretraining, finetuning, RLHF, agents, security. The talk to send a smart non-ML person.
Deep Dive into LLMs like ChatGPT
Andrej Karpathy · 210min · user
Three-and-a-half hour comprehensive walkthrough of how ChatGPT actually works: pretraining, SFT, RLHF, hallucinations, tools. Karpathy's 2025 follow-up to his earlier intro.
Imaginary Numbers Are Real [Part 1: Introduction]
Welch Labs · 6min · novice
Not AI directly, but the math intuition Welch builds (geometric interpretation of abstract objects) is the same muscle you need for embeddings, attention, and latent spaces.
Neural Networks Pt. 1: Inside the Black Box
StatQuest with Josh Starmer · 19min · novice
Starmer's gentlest possible introduction to neural networks. Pairs well with 3Blue1Brown for learners who need it twice from two angles.
Neural Networks Pt. 2: Backpropagation Main Ideas
StatQuest with Josh Starmer · 17min · novice
Backpropagation explained without calculus first, then with. Starmer's BAM pedagogy is unironically effective.
Stanford CS229 Lecture 2 - Linear Regression and Gradient Descent
Stanford Online · 80min · learner
Andrew Ng's Stanford CS229 lecture on the foundations. Doctorate-grade, free, the bedrock under everything LLM-related.
MIT 6.S191: Introduction to Deep Learning
Alexander Amini · 60min · learner
MIT's annually-updated deep-learning intro. Cleaner narrative arc than CS229 for someone who wants the modern view, with current code.
transformers
But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning
3Blue1Brown · 27min · learner
Best visual introduction to transformer architecture in existence. Tokens, embeddings, the residual stream, attention, drawn so the geometry sticks.
Attention in transformers, visually explained | Chapter 6, Deep Learning
3Blue1Brown · 26min · learner
Query, key, value matrices shown as actual matrices doing actual things. Removes the mystery of attention is all you need in 26 minutes.
Let's build GPT: from scratch, in code, spelled out.
Andrej Karpathy · 116min · user
Karpathy implements nanoGPT, a working GPT, from blank file to trained model in under two hours. The single most-cited I finally got it video for transformers.
Let's build the GPT Tokenizer
Andrej Karpathy · 134min · user
Byte-pair encoding from scratch. Most engineers ship LLM apps without understanding tokenization, which is why their prompts behave weird. Karpathy fixes that.
Let's reproduce GPT-2 (124M)
Andrej Karpathy · 240min · operator
Four-hour live build reproducing GPT-2 from scratch, including data, training loop, and optimizer details. The closest thing to industrial-strength LLM training pedagogy for free.
State of GPT | BRK216HFS
Microsoft Developer · 42min · user
Karpathy at Microsoft Build 2023. The cleanest one-shot diagram of pretraining to SFT to reward model to RL still in active use as a mental model.
Attention Is All You Need
Yannic Kilcher · 28min · user
Kilcher's walkthrough of the original 2017 transformer paper. Best read-the-paper-with-someone-smart treatment. Pairs well with the 3Blue1Brown visualization.
GPT-3: Language Models are Few-Shot Learners (Paper Explained)
Yannic Kilcher · 65min · user
The paper that made the world pay attention. Kilcher dissects the scaling claims, the few-shot prompting trick, and the implications, recorded in 2020 before the hype cycle hit.
Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy
Stanford Online · 75min · user
Karpathy lecturing Stanford on transformers as a general-purpose differentiable computer. One of the best high-altitude framings of the architecture.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
Grant Sanderson · 60min · user
Grant Sanderson distills his entire deep-learning chapter series into one talk. The clearest single hour on transformer geometry available.
Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
Umar Jamil · 110min · operator
Umar Jamil's paper-by-paper code walkthroughs are doctorate-grade. Mamba/State Space Models are the most serious post-transformer architecture. The deepest free explainer.
What are Transformer Models and How do they Work?
AI Coffee Break with Letitia · 13min · learner
Letitia Parcalabescu's accessible 13-minute transformer overview. Right altitude for someone who wants the concept before committing to Karpathy's 2-hour build.
interpretability
How might LLMs store facts | Chapter 7, Deep Learning
3Blue1Brown · 22min · user
MLP layers as key-value memory stores. Bridges to mechanistic interpretability, where Anthropic and DeepMind research is actually pointing.
Mechanistic Interpretability with Neel Nanda
Machine Learning Street Talk · 90min · operator
Neel Nanda (DeepMind interpretability lead) on how to reverse-engineer what's actually happening inside transformer layers. The field's current frontier.
practical workflows
How I use LLMs
Andrej Karpathy · 130min · user
The practitioner's companion to Deep Dive into LLMs. Karpathy demonstrates his actual day-to-day LLM workflow: coding, research, audio, vision. Counter-programming against prompt-engineering hype.
GPT-4 - How does it work, and how do I build apps with it? - CS50 Tech Talk
CS50 · 90min · user
Harvard CS50 tech talk on building with GPT-4. Practical, grounded, and at the right altitude for engineers crossing into LLM apps.
RAG vs. Fine-Tuning
IBM Technology · 8min · user
Cedric Clyburn (IBM) cleanly delineating when to RAG vs. when to fine-tune. The single decision most LLM-app builders get wrong; this is the sober answer.
What is Retrieval-Augmented Generation (RAG)?
IBM Technology · 6min · novice
Marina Danilevsky's whiteboard explainer for RAG. Six minutes, zero hype, and the right mental model before anyone touches LangChain or a vector DB.
safety
The Orthogonality Thesis
Robert Miles AI Safety · 12min · learner
Why intelligence and goals are independent axes. Miles explains the foundational result that smart does not imply aligned with us. Short, rigorous, canonical.
Why Would AI Want to do Bad Things? Instrumental Convergence
Robert Miles AI Safety · 11min · learner
Why most goals converge on sub-goals like self-preservation, resource acquisition, and goal-preservation. Once you see this, AI risk arguments stop feeling like sci-fi.
Intro to AI Safety, Remastered
Robert Miles AI Safety · 18min · novice
Best single what-is-AI-safety-as-a-field video. Miles rebuilt his original intro with cleaner argumentation. Send this to skeptics.
The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment
Robert Miles AI Safety · 23min · user
The inner-alignment problem made accessible. This is the issue that keeps interpretability researchers up at night. A model can be trained on the right objective and still learn the wrong one internally.
AI 'Stop Button' Problem - Computerphile
Computerphile · 20min · novice
Robert Miles on Computerphile explaining why making an off-switch for a sufficiently capable AI is harder than it looks. Classic AI-safety pedagogy.
Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...
Robert Miles AI Safety · 15min · operator
Goes one level deeper into mesa-optimization and deceptive alignment. Not entry-level, but the canonical accessible treatment of a result Anthropic and Redwood are actively investigating.
Reward Hacking: Concrete Problems in AI Safety Part 4
Robert Miles AI Safety · 9min · learner
Why models find loopholes in their reward functions. Reward hacking is no longer theoretical, it's a daily occurrence in frontier RLHF. The canonical explainer.
Constitutional AI: Harmlessness from AI Feedback (Anthropic Paper Explained)
Yannic Kilcher · 41min · operator
Walkthrough of Anthropic's Constitutional AI paper, the technique behind Claude's training. Critical for understanding how RLAIF differs from RLHF.
Dario Amodei: Anthropic CEO on Claude, AGI and the Future of AI and Humanity
Lex Fridman Podcast · 300min · user
Five-hour conversation with Anthropic's CEO covering Claude's training, scaling laws, mechanistic interpretability, and the existential argument. The single best public source on how Anthropic actually thinks.
history
AlphaGo - The Movie | Full award-winning documentary
DeepMind · 90min · novice
Not technical, but historically essential. The Lee Sedol match was the moment the field stopped being theoretical. Move 37 is referenced in alignment papers to this day.
Geoffrey Hinton: The Foundations of Deep Learning
Lex Fridman Podcast · 90min · user
Hinton on backprop, Boltzmann machines, and his pivot to AI-risk work. The most important living source on where deep learning actually came from.
Ilya Sutskever: Deep Learning
Lex Fridman Podcast · 67min · user
Sutskever (then OpenAI Chief Scientist) on the scaling hypothesis, compression as intelligence, and the road from AlexNet to GPT. Recorded 2020. Read it against what came after.
Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning
Lex Fridman Podcast · 90min · user
LeCun argues against the LLM-only path and for self-supervised world models. The most rigorous public dissent against the current paradigm from a Turing-award winner.
Demis Hassabis: DeepMind - AI, Superintelligence and the Future of Humanity
Lex Fridman Podcast · 130min · user
Hassabis on AlphaGo, AlphaFold, and the path to AGI from a neuroscience background. Counterpoint to the LLM-centric narrative.
Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI
Lex Fridman Podcast · 210min · user
Karpathy on his Tesla years, the transition to LLMs, and his framework for evaluating AI progress. Rich technical detail for builders.
ChatGPT: 30 Year History | How AI Learned to Talk
Art of the Problem · 26min · novice
Clean visual history of language modeling from n-grams to ChatGPT. Great primer to send before showing someone Karpathy.
How AIs, like ChatGPT, Learn
CGP Grey · 8min · novice
Pre-LLM (2017) but the bot-teacher framing of supervised learning aged beautifully. Send this first to anyone who's never thought about how training works.
How AI Could Empower Any Business | Andrew Ng | TED
TED · 13min · novice
Andrew Ng's 2022 TED talk on democratizing AI. Useful for non-technical operators who need a grounded reframe of what AI does for ordinary businesses.
capability evals
Sparks of AGI: early experiments with GPT-4
Sebastien Bubeck · 55min · user
Microsoft Research talk presenting the Sparks of AGI paper findings. Whether you agree with the framing or not, it's load-bearing in the discourse. Required watching for the debate.
LAB · ATOMEONS · MARCO ISLAND FL·ÆONS RESEARCH · 12 PAPERS · CC-BY 4.0·ORANGEBOX v1.0.0-beta · TURBO-OPTIMIZE CLAUDE · SHIPPED 2026-05-30·B00KMAKR v3.2.0 · AI PUBLISHING COCKPIT · MAC + WINDOWS·FREE LAUNCH WEEK · ENDS JUNE 6 · §4A NO-SAAS LOCK·FOUNDER'S VIEW · NEXT BROADCAST IN ...·CITE THE WORK · FORWARD THE LINK · NO ALGORITHM·LAB · ATOMEONS · MARCO ISLAND FL·ÆONS RESEARCH · 12 PAPERS · CC-BY 4.0·ORANGEBOX v1.0.0-beta · TURBO-OPTIMIZE CLAUDE · SHIPPED 2026-05-30·B00KMAKR v3.2.0 · AI PUBLISHING COCKPIT · MAC + WINDOWS·FREE LAUNCH WEEK · ENDS JUNE 6 · §4A NO-SAAS LOCK·FOUNDER'S VIEW · NEXT BROADCAST IN ...·CITE THE WORK · FORWARD THE LINK · NO ALGORITHM·