::deep-dive

Foundational Machine Learning

Supervised, unsupervised, and reinforcement learning fundamentals — the canon you need before deep learning

Before transformers, before deep learning, there is classical machine learning — and skipping it produces a particular kind of brittle practitioner who can fine-tune Llama but cannot explain why a random forest beats a neural network on tabular data, cannot debug a model that is overfitting because they never internalized bias-variance, and cannot evaluate a paper's baseline because they have no taste for what 'good' looks like outside deep learning. A doctorate-grade researcher needs the full canon: linear and logistic regression, regularization (L1/L2 and their geometric interpretations), kernel methods, decision trees and ensembles, naive Bayes, k-means and Gaussian mixtures, EM, PCA, the bias-variance decomposition, cross-validation, the bootstrap, the curse of dimensionality, no-free-lunch theorems, and the statistical learning theory undercurrents (VC dimension, Rademacher complexity at the conceptual level, PAC learning). The two anchor textbooks — Elements of Statistical Learning (Hastie, Tibshirani, Friedman) and the more accessible Introduction to Statistical Learning (James, Witten, Hastie, Tibshirani) — are the canonical references; every serious ML researcher has read one or both. ESL is the harder text and the one that builds taste. ISLR is the gentler entry point and the one that ships you with R/Python code. Pair these with the fast.ai course for a hands-on counterweight; Jeremy Howard's pedagogy is the antidote to overly theoretical textbook-only learning. Reinforcement learning sits adjacent here — Sutton and Barto's textbook is the canonical entry, and you need at least Q-learning, policy gradients, and the actor-critic formulation before approaching RLHF. By the end of this path, you should be able to pick the right model for a given dataset, diagnose underfitting vs overfitting from learning curves alone, and explain why deep learning is not always the answer.

::reading path · in order

::01 · textbook
~80h
An Introduction to Statistical Learning — James, Witten, Hastie, Tibshirani (free PDF, 2nd edition with Python)
The gentle on-ramp. Read this first if classical ML is unfamiliar. Every chapter has worked code and exercises.
::02 · textbook
~150h
The Elements of Statistical Learning — Hastie, Tibshirani, Friedman (free PDF, 2nd edition)
The canonical reference. Read after ISLR. Chapters 3 (linear methods), 7 (model assessment), 10 (boosting), and 14 (unsupervised) are mandatory.
::03 · course
~40h
fast.ai — Practical Deep Learning for Coders (course by Jeremy Howard and Sylvain Gugger)
Top-down hands-on counterweight to ESL's bottom-up theory. Builds real intuition by getting models running fast.
::04 · textbook
~120h
Pattern Recognition and Machine Learning — Christopher Bishop
The Bayesian-flavored alternative to ESL. Chapters on graphical models, EM, and variational inference are the standard.
::05 · textbook
~90h
Reinforcement Learning: An Introduction — Sutton and Barto (2nd edition, free PDF)
The RL bible. Chapters 1-13 are the canonical curriculum. Required before any modern RLHF paper.
::06 · course
~60h
Andrew Ng's Machine Learning Specialization (Coursera)
The classic gentle introduction. Better as a refresher or first exposure than as a doctorate-path text, but covers the basics cleanly.
::07 · code
~20h
scikit-learn user guide and API documentation
Treat this as a textbook. The sklearn docs are pedagogically excellent and every example doubles as a working implementation.
::08 · code
~30h
Kaggle — playground and getting-started competitions (Titanic, House Prices, MNIST)
Cheap reps. Forces you to actually feature-engineer, validate, and submit predictions on real data.

::exercises · build · derive · reproduce

01Implement L1 and L2 regularized linear regression from scratch (no sklearn). Reproduce the LASSO solution path.
02Code a decision tree classifier from scratch and a random forest wrapper around it. Compare to sklearn on a benchmark.
03Implement k-means and EM for Gaussian mixtures. Visualize convergence on a toy dataset.
04Run a complete bias-variance decomposition experiment on polynomial regression of increasing degree.
05Implement tabular Q-learning on Frozen Lake from gym/gymnasium. Plot learning curves.
06Take a Kaggle tabular competition and beat a gradient-boosted tree baseline using only classical methods.

::milestones · observable

▲You can explain why XGBoost still wins on tabular data.
▲You can diagnose bias vs variance from a learning curve in under thirty seconds.
▲You can derive logistic regression from a maximum likelihood argument.
▲You can implement tabular RL and explain why function approximation is hard.
▲You can pick an appropriate baseline for any new ML problem without thinking.