Bare leafless tree branches against deep navy sky, a single bio-cyan light at one branch tip.

AI competency skill tree

Six trunks. Forty-ish skills. Most jobs need 30% of one trunk and 10% of two others.

Most "learn AI" maps fail the same way. They sprawl. They imply that to ship anything useful you must first inhale a calculus textbook, then a deep-learning textbook, then a six-month transformers tutorial, then "AI ethics," then finally — exhausted — produce something. The map gets confused with the territory. Real practitioners look nothing like that. This page is the opposite. It is a competency skill tree with six trunks: foundations, machine learning fundamentals, deep learning, practical AI engineering, AI product and strategy, and AI safety and interpretability. Each trunk holds four to seven named skills. For each skill we say what mastery actually looks like in observable behavior, point to one free resource we have personally used and can defend, and give a short test you can run on yourself to check whether the skill is real or theatrical. The honest claim is this: a working AI engineer or AI-literate operator usually needs roughly thirty percent of one trunk and ten percent of two others. A safety researcher needs more of trunk six and trunk three. A startup founder applying AI needs more of trunk five and trunk four. Almost nobody needs full mastery of all six. Specialization wins. Generalist sprawl burns out. We picked free resources deliberately. Karpathy's Zero to Hero on YouTube, fast.ai, 3Blue1Brown's neural networks playlist, the Hugging Face course, the Anthropic interpretability tutorials, Berkeley and Stanford open coursework — these exist because their authors wanted them to exist. They are not marketing funnels. Paid courses can be excellent, but they should be earned by first establishing whether you actually want to live in this discipline, which is what the free path is for. The voice across the tree is plain-language. No "unlock," "master," or "supercharge." If something is uncertain as of June 2026 best-effort, we say so. If a fact depends on provider pricing or model availability, we say to check the docs. The point is to give you a map you can act on this week, not a credential to display.

How to read the tree

Six trunks, each treated as a coherent domain rather than a checklist. Inside every trunk, individual skills get three things: a definition of what mastery looks like (a behavior, not a vibe), a free resource that teaches it (we name actual courses and papers, not platforms), and a self-test (something you can attempt this weekend that will either work or fail honestly). The trunks are roughly ordered by abstraction depth, not by importance. Foundations sits first because most of the others depend on it, but you do not need to finish trunk one before touching trunk four. In practice, the best learners we have watched bounce between trunks: they build a tiny RAG system (trunk four), hit a retrieval-quality wall, learn embedding fundamentals (trunk three), realize they cannot evaluate properly (trunk two), and come back stronger. The tree is a map of dependencies, not a forced sequence. A realistic budget for getting useful-at-one-trunk plus literate-at-two-others is three to nine months of focused part-time effort, depending on prior math and programming background. Full senior mastery in any single trunk is a multi-year commitment. Be suspicious of any timeline shorter than that — those are usually selling something.

Trunk one — foundations

The mathematical and computational substrate. You do not need to be a mathematician, but you do need enough to read a paper without bouncing off the notation. Skim-level fluency in linear algebra, calculus, probability, and Python is usually enough to start.

Linear algebra

Free: 3Blue1Brown — Essence of Linear Algebra

Mastery: you can read a matrix-multiplication expression and predict the output shape without running it. You understand why dot products measure similarity and what a basis change actually does to a vector. Resource: Grant Sanderson's Essence of Linear Algebra YouTube series — fifteen short visual episodes that build intuition before notation. Test: implement matrix multiplication from scratch in NumPy without copying, then explain in plain English what each loop is doing.

Calculus and gradients

Free: 3Blue1Brown — Essence of Calculus

Mastery: you understand backpropagation as the chain rule applied recursively, not as magic. You can compute a simple derivative by hand and recognize why neural net training is gradient descent on a loss surface. Resource: Essence of Calculus, then Karpathy's micrograd lecture which derives backprop from scratch. Test: hand-compute the gradient of a two-layer network on paper for a single example, then verify with PyTorch autograd.

Probability and statistics

Free: Seeing Theory (Brown University)

Mastery: you can read a confusion matrix without panicking. You know the difference between correlation and causation in a working sense, not just a slogan. You understand why a model with ninety-nine percent accuracy can still be useless on imbalanced data. Resource: Seeing Theory at seeing-theory.brown.edu — interactive, free, by Brown's mathematics department. Test: given a test with one-percent base rate, compute the probability that a positive result is a true positive.

Python and scientific stack

Free: Python.org tutorial + NumPy quickstart

Mastery: you read NumPy or PyTorch code and follow shapes without printing. You know what broadcasting does and where it bites you. You can vectorize a loop without thinking about it. Resource: the official Python tutorial at docs.python.org, then NumPy's quickstart at numpy.org/doc. Test: rewrite a triple-nested Python loop as a single vectorized NumPy expression and time the speedup.

Working with data

Free: Pandas official user guide

Mastery: you can load a messy CSV, profile it, find the column with eighteen percent missing values, and decide what to do about it. You understand groupby, merge, and pivot from muscle memory. Resource: the pandas user guide at pandas.pydata.org/docs. Test: take any public dataset (the UCI repository works), load it, and produce three honest observations and one question you cannot yet answer.

Reading academic papers

Free: Karpathy — How to read papers (talks and threads)

Mastery: you can read an arxiv paper's abstract, skim figures, and decide in under ten minutes whether to read it fully. You know that most papers do not survive replication and you read accordingly. Resource: Andrej Karpathy has discussed his paper-reading process in multiple talks and tweets — search for 'Karpathy reading papers' on YouTube. Test: read the Attention Is All You Need paper (arxiv 1706.03762) and write a one-page plain-English summary.

Trunk two — ML fundamentals

Classical machine learning. Most production AI systems still run logistic regression somewhere. Skipping this trunk because 'deep learning won' is a common and expensive mistake.

Regression and classification

Free: scikit-learn documentation + Andrew Ng's CS229

Mastery: you can pick logistic regression, decision trees, or gradient boosting for a tabular problem and justify the choice in one sentence. You understand bias-variance tradeoffs in a working sense. Resource: scikit-learn.org documentation, then Stanford CS229 lecture notes (free at cs229.stanford.edu). Test: on a tabular dataset, beat a logistic regression baseline by at least three percent F1, then explain why your model won.

Evaluation and metrics

Free: Google ML Crash Course

Mastery: you know that accuracy is usually the wrong metric. You can pick precision, recall, F1, AUC, or calibration based on what the model is for, not what is easiest. Resource: Google's Machine Learning Crash Course at developers.google.com/machine-learning/crash-course. Test: build a binary classifier where accuracy is high but recall is unacceptable, and explain the business consequence.

Cross-validation and leakage

Free: scikit-learn cross-validation guide

Mastery: you can spot data leakage in someone else's pipeline. You know why time-series splits are not random splits and why mixing patient IDs across splits invalidates a medical model. Resource: scikit-learn's model_selection documentation. Test: deliberately introduce a leakage bug into a notebook, measure the inflated score, fix it, and document the gap.

Feature engineering

Free: Kaggle Learn — Feature Engineering

Mastery: you can take raw data and produce features that materially improve a model — not just transform-everything spam. You know when to scale, encode, target-encode, or leave alone. Resource: kaggle.com/learn/feature-engineering — short, practical, free. Test: take a public tabular contest dataset, build three feature engineering versions, and document which actually helped on held-out data.

Tree-based models in practice

Free: XGBoost documentation + Catboost docs

Mastery: you reach for gradient-boosted trees first on tabular problems and know why. You understand learning rate, depth, and regularization parameters at a working level. Resource: xgboost.readthedocs.io and catboost.ai/docs. Test: on a public tabular benchmark, beat scikit-learn's default RandomForest by tuning XGBoost or LightGBM, and explain the tuning.

Causal thinking

Free: Causal Inference: The Mixtape (free online)

Mastery: you can name three reasons a model showing 'X predicts Y' does not mean X causes Y. You know the basics of confounders, colliders, and selection bias. Resource: Scott Cunningham's Causal Inference: The Mixtape, free online at mixtape.scunning.com. Test: take a published finding from a popular article, identify a plausible confounder, and write what experiment would test it.

Trunk three — deep learning

Neural networks, transformers, and large-scale training. The most-hyped trunk and the one where free top-tier resources are unusually abundant. The 'micrograd to transformer' path Karpathy laid out is the most under-priced curriculum on the public internet.

Neural network fundamentals

Free: Karpathy — Neural Networks: Zero to Hero

Mastery: you can implement a small MLP from scratch including backpropagation, training loop, and weight initialization. You know what happens to gradients in deep networks and why ReLU exists. Resource: Andrej Karpathy's Zero to Hero series on YouTube (youtube.com/@AndrejKarpathy) — start with the micrograd lecture. Test: build a two-layer network from scratch in NumPy that solves XOR or MNIST above ninety-five percent accuracy.

Transformers and attention

Free: Attention Is All You Need (arxiv 1706.03762) + Karpathy nanoGPT

Mastery: you can draw a transformer block from memory and explain each component. You understand why attention is permutation-invariant and why positional encodings are needed. Resource: read the original Vaswani et al. paper at arxiv.org/abs/1706.03762, then watch Karpathy's 'Let's build GPT' video which implements nanoGPT from scratch. Test: implement a small character-level transformer that generates Shakespeare-ish text after a few hours of training on a laptop.

Training, optimization, and tricks

Free: fast.ai Practical Deep Learning for Coders

Mastery: you know what learning rate finder does, why batch size matters, and what happens when you mismatch dropout and batch normalization. You can debug a training run that is not converging. Resource: course.fast.ai — Jeremy Howard's free practical course, top-down rather than bottom-up. Test: take a public dataset, train a model, hit a non-converging run, and diagnose the failure in writing.

Computer vision basics

Free: Stanford CS231n (course notes free online)

Mastery: you understand convolutions, pooling, and why ResNets work. You know that vision transformers exist and roughly when to pick which architecture. Resource: cs231n.stanford.edu — lecture notes are free; video lectures from older years are on YouTube. Test: fine-tune a pretrained ResNet on a small custom image dataset and report honest accuracy on a held-out test set.

NLP and language models

Free: Hugging Face Course

Mastery: you can tokenize text, run inference with a pretrained model, fine-tune for a downstream task, and evaluate honestly. You know what an embedding is and what BPE does. Resource: huggingface.co/learn/nlp-course — free, hands-on, library-aligned. Test: fine-tune a small open model on a classification task and beat zero-shot prompting by a measurable margin.

Scaling and efficiency

Free: Chinchilla paper (arxiv 2203.15556) + Hoffmann et al.

Mastery: you understand at a conceptual level what compute-optimal training is, why FLOPs and parameter counts are not the same thing, and the rough shape of the scaling-law literature. Resource: the Chinchilla paper, Training Compute-Optimal Large Language Models at arxiv.org/abs/2203.15556. Test: read the paper and explain in three sentences what its main correction to earlier scaling laws was.

Trunk four — practical AI engineering

The trunk most jobs actually pay for in 2026. Building things with existing models — RAG, agents, evaluation harnesses, deploys — rather than training new ones. Heavy on engineering discipline, lighter on math.

Prompting and LLM APIs

Free: Anthropic prompt engineering docs + OpenAI cookbook

Mastery: you can take a vague natural-language task and produce a prompt that hits the target ninety percent of the time, with examples, constraints, and an output format. You know when to use few-shot versus zero-shot. Resource: docs.anthropic.com/en/docs/build-with-claude/prompt-engineering and cookbook.openai.com — both free, both maintained by the model providers. Test: build a prompt that extracts structured data from messy text with above ninety-five percent accuracy on a held-out set.

Retrieval-augmented generation (RAG)

Free: LangChain docs + LlamaIndex docs (concept guides)

Mastery: you can build a retrieval pipeline that improves answer quality over a raw model, evaluate it honestly, and know when retrieval is making things worse. You understand chunking, embeddings, hybrid search, and reranking. Resource: python.langchain.com and docs.llamaindex.ai — focus on the concept guides, not framework lock-in. Test: build a RAG system over a real document corpus and produce a held-out evaluation showing the retrieval actually helped.

Agents and tool use

Free: Anthropic — Building effective agents (Dec 2024 post)

Mastery: you can build a tool-using agent that does not silently fail. You know that most 'agent' demos collapse under real evaluation, and you build accordingly with retries, fallbacks, and human-in-the-loop checkpoints. Resource: Anthropic's 'Building effective agents' essay at anthropic.com/research/building-effective-agents — clearest plain-language guide as of 2026 best-effort. Test: build an agent that completes a real five-step task with above eighty percent success on a held-out evaluation.

Evaluation harnesses

Free: Hugging Face evaluate library + EleutherAI lm-eval-harness

Mastery: you write evals before you write features. You know that 'vibes-based' evaluation does not survive production. You can build a regression test suite for LLM behavior. Resource: github.com/EleutherAI/lm-evaluation-harness and huggingface.co/docs/evaluate. Test: build an eval suite for a specific LLM task that catches a real regression you introduce on purpose.

Vector databases and embeddings

Free: pgvector docs + Sentence-Transformers documentation

Mastery: you understand that embeddings are not magic and that different embedding models give meaningfully different results. You know when to use cosine similarity, when to rerank, and when to use full-text search instead. Resource: github.com/pgvector/pgvector and sbert.net. Test: compare two embedding models on a retrieval task and report which won and by how much, with statistical significance.

Deployment and observability

Free: vLLM documentation + standard MLOps writing

Mastery: you can deploy a model behind an API, monitor latency and cost, and respond to a regression in production. You understand caching, batching, and quantization tradeoffs at a working level. Resource: docs.vllm.ai for inference servers; combine with general SRE writing for observability. Test: deploy a small open-weight model behind an API on a VPS or local box, with logging, and serve a real request.

Cost and latency engineering

Free: provider pricing pages — check provider docs for current pricing

Mastery: you can take a feature spec and produce a cost estimate within twenty percent of actual, before building. You understand prompt caching, batch APIs, model tiers, and when a smaller model is the right call. Resource: anthropic.com/pricing, openai.com/pricing, and equivalent provider docs — pricing changes frequently, always check the current docs. Test: estimate the monthly cost of a feature at a given QPS, build it, and compare the bill to the estimate.

Trunk five — AI product and strategy

The least-respected and most-underrated trunk. Knowing what to build, for whom, and at what price beats knowing how to train another transformer. Most AI failures in 2025 and 2026 were product failures, not technical ones.

Problem selection

Free: Y Combinator startup library + Paul Graham essays

Mastery: you can look at a market and identify three places where AI changes the unit economics, and three places where it does not. You resist the urge to 'add AI' to everything. Resource: ycombinator.com/library and paulgraham.com/articles.html — free, opinionated, useful. Test: write a one-page memo explaining why one specific AI feature would succeed and another would fail, with reasoning that survives a hostile read.

Pricing and packaging

Free: First Round Review pricing essays

Mastery: you can pick between subscription, usage, and outcome-based pricing for a given product and justify it. You understand the math of usage-based pricing under variable token costs. Resource: review.firstround.com — search 'pricing' and 'packaging'; supplement with provider pricing pages. Test: take any AI product idea and produce three pricing models with the trade-offs of each in one page.

Positioning

Free: April Dunford — Obviously Awesome (talks and excerpts free online)

Mastery: you can name three direct competitors to any product you work on and articulate why a customer would pick one over another in one sentence each. Resource: aprildunford.com and her free talks on YouTube. Test: write the positioning statement for a real AI product (yours or a competitor's) using April's five-question template, and have someone unfamiliar with the space rate clarity.

Honest evaluation in product context

Free: Hamel Husain — AI Evals essays

Mastery: you build product metrics that survive scrutiny. You know the difference between 'users liked the demo' and 'users completed the job to be done.' Resource: hamel.dev — Hamel Husain's writing on LLM evaluations is the most cited working-practitioner resource as of 2026 best-effort. Test: define a real product success metric for an AI feature, instrument it, and ship a change measured against it.

Ethics in product decisions

Free: Partnership on AI publications + DeepMind safety blog

Mastery: you can name three classes of harm your product could cause and what mitigations would cost. You do not confuse 'we wrote a policy doc' with 'we changed the product.' Resource: partnershiponai.org/publications and deepmind.google/discover/blog — both publish accessible writing on real product-level harms. Test: produce a one-page risk assessment for a real AI feature, with mitigations costed in engineering time.

Communicating to non-technical stakeholders

Free: Distill.pub archive + Anthropic research blog

Mastery: you can explain a model's behavior to a CEO, a lawyer, and a customer in three different registers without lying. Resource: distill.pub for the gold standard of accessible ML writing (archive remains free); anthropic.com/research for current-state plain-language model writing. Test: take one technical AI concept and write three explanations of it at three different reading levels — executive, technical-but-not-ML, and ML peer.

Trunk six — AI safety and interpretability

The trunk that decides whether the rest of this stack remains trustworthy. Growing fast as a research field; smaller as a job market than the hype suggests but real and growing. Mech interp specifically went from niche to mainstream between 2023 and 2026.

Alignment fundamentals

Free: AGI Safety Fundamentals (BlueDot Impact)

Mastery: you can name the core open problems — specification gaming, mesa-optimization, deceptive alignment — and what current research is doing about them. You do not treat 'alignment' as a slogan. Resource: bluedot.org — the AGI Safety Fundamentals course is free, structured, and respected. Test: read three current alignment papers and produce a one-page synthesis of where the field disagrees with itself.

Red-teaming and adversarial evaluation

Free: Anthropic and OpenAI red-team reports (model cards)

Mastery: you can take a model and find five categories of failure that the developer's eval missed. You know that adversarial inputs do not need to be exotic — most are mundane. Resource: model cards published with frontier model releases (anthropic.com and openai.com publish these); supplement with the lm-eval-harness adversarial suites. Test: take any deployed LLM and find a reproducible failure mode that violates its stated policy.

Mechanistic interpretability

Free: Anthropic interpretability tutorials + Neel Nanda's TransformerLens

Mastery: you understand what a 'feature' means in a sparse autoencoder, what circuits are, and why finding induction heads mattered. You can read a current interpretability paper without bouncing. Resource: transformer-circuits.pub publishes Anthropic's open research; supplement with Neel Nanda's TransformerLens library and tutorials at neelnanda.io. Test: replicate a small interpretability result — for example, find induction heads in a small open model — using TransformerLens.

Evals for dangerous capabilities

Free: METR evaluations writing + Anthropic responsible scaling policy

Mastery: you understand why certain evaluations cannot be public, the difference between capability evaluations and propensity evaluations, and the basic structure of a Responsible Scaling Policy. Resource: metr.org and anthropic.com/news/anthropics-responsible-scaling-policy. Test: read one RSP and produce a one-page critique of where you think the thresholds are too loose or too tight, with reasoning.

Policy and governance literacy

Free: AI Index Report (Stanford HAI, annual, free)

Mastery: you can name the major active regulatory regimes — the EU AI Act, the US executive actions, the UK AISI work — and what each actually requires. Specifics shift; check current sources. Resource: aiindex.stanford.edu — Stanford HAI's annual AI Index Report is the most-cited neutral source as of 2026 best-effort. Test: take one current policy proposal and write a memo on what it would change in a specific AI product's day-to-day operations.

Open-weight model risk analysis

Free: NIST AI Risk Management Framework

Mastery: you understand the trade-offs between releasing weights openly, releasing under license, and keeping closed. You can describe a specific harm model and a specific benefit model without sloganeering. Resource: nist.gov/itl/ai-risk-management-framework — free, government-issued, citable. Test: pick one open-weight model release and produce a one-page net-impact analysis with honest uncertainty intervals.

The honest specialization table

Most working AI roles need depth in one trunk and literacy in two others. Use this as a starting point, not a contract. The percentages are rough working estimates of where attention should go.

Role	Primary trunk (~30-40%)	Secondary trunks (~10-15% each)	Can mostly skip
Applied ML engineer	Trunk 4 (engineering)	Trunks 2 and 3	Trunk 6 unless your domain needs it
Research engineer	Trunk 3 (deep learning)	Trunks 1 and 4	Trunk 5 product-side
Safety researcher	Trunk 6 (safety)	Trunks 3 and 1	Trunk 5 except policy
Founder building AI product	Trunk 5 (product)	Trunks 4 and 2	Trunk 1 beyond literacy, trunk 3 beyond literacy
Data scientist	Trunk 2 (ML fundamentals)	Trunks 1 and 5	Trunk 3 unless project demands
AI policy analyst	Trunk 5 + 6 split	Trunks 3 and 4 at literacy depth	Trunks 1 and 2 beyond conceptual
Interpretability researcher	Trunk 6 (interp)	Trunks 3 and 1	Trunk 5 beyond communication

RoleApplied ML engineer

Primary trunk (~30-40%)Trunk 4 (engineering)

Secondary trunks (~10-15% each)Trunks 2 and 3

Can mostly skipTrunk 6 unless your domain needs it

RoleResearch engineer

Primary trunk (~30-40%)Trunk 3 (deep learning)

Secondary trunks (~10-15% each)Trunks 1 and 4

Can mostly skipTrunk 5 product-side

RoleSafety researcher

Primary trunk (~30-40%)Trunk 6 (safety)

Secondary trunks (~10-15% each)Trunks 3 and 1

Can mostly skipTrunk 5 except policy

RoleFounder building AI product

Primary trunk (~30-40%)Trunk 5 (product)

Secondary trunks (~10-15% each)Trunks 4 and 2

Can mostly skipTrunk 1 beyond literacy, trunk 3 beyond literacy

RoleData scientist

Primary trunk (~30-40%)Trunk 2 (ML fundamentals)

Secondary trunks (~10-15% each)Trunks 1 and 5

Can mostly skipTrunk 3 unless project demands

RoleAI policy analyst

Primary trunk (~30-40%)Trunk 5 + 6 split

Secondary trunks (~10-15% each)Trunks 3 and 4 at literacy depth

Can mostly skipTrunks 1 and 2 beyond conceptual

RoleInterpretability researcher

Primary trunk (~30-40%)Trunk 6 (interp)

Secondary trunks (~10-15% each)Trunks 3 and 1

Can mostly skipTrunk 5 beyond communication

The minimum effective dose path

If you are starting cold and want to be useful — not credentialed, useful — here is the leanest sequence we have seen actually work. Twelve to twenty weeks of focused part-time effort. The point is not speed; the point is to find out quickly whether you want to keep going.

Weeks 1-3
Foundations sprint
3Blue1Brown's neural networks and linear algebra playlists. Karpathy's micrograd lecture. Python and NumPy basics from numpy.org. End state: you can implement a one-layer neural network from scratch and explain what backprop is doing.
Weeks 4-6
Classical ML
scikit-learn tutorial. Run logistic regression, decision trees, and gradient boosting on a public tabular dataset. Learn to evaluate with cross-validation and avoid leakage. End state: you can produce honest held-out metrics on a real dataset.
Weeks 7-9
Deep learning core
Karpathy's makemore and nanoGPT lectures. Hugging Face course chapters 1 through 4. Fine-tune a pretrained model on a small task. End state: you have trained, not just inferred, a real model.
Weeks 10-12
Practical engineering
Build a tiny RAG system. Build a small evaluation harness. Deploy something behind an API. End state: you have shipped a working AI feature that someone besides you can use.
Weeks 13-16
Specialize
Pick one trunk to go deep in based on what felt alive during the prior phase. Stop trying to be a generalist. End state: you have a defensible answer when someone asks 'what do you actually do.'
Weeks 17-20
Ship and publish
Build one substantial thing in your chosen specialization and write it up publicly. The write-up matters more than the project. End state: there is something at a URL with your name on it that demonstrates the skill.

Where most people lose the thread

Three predictable failure modes. First: tutorial loop. Watching one more course instead of building one more thing. The bar is whether you have shipped, not whether you have studied. Second: trunk sprawl. Trying to learn all six trunks in parallel and ending up shallow at all of them. Pick one, then borrow from two. Third: hype substitution. Spending time on the model release of the week instead of fundamentals that compound. Models change every six months; matrix multiplication does not. The honest move is to read fewer announcements and finish more projects.

Resources we do not list and why

There is a long tail of paid bootcamps, certification programs, and influencer courses that promise to teach AI. We are not listing them. Not because they are all bad — some are excellent — but because the free resources above are usually equal or better, and the gate of paying nine hundred dollars upfront filters in the wrong direction. If after working through the free path you find a specific paid course filling a specific gap, that is the time to spend. Not before. We also did not list every popular framework. We listed concept-level documentation (LangChain, LlamaIndex, Hugging Face) because the frameworks themselves change faster than the concepts. Two years from now the names will be different; the patterns of retrieval, chunking, evaluation, and deployment will not. Optimize for the durable layer. Finally we deliberately omitted recommendations for specific Twitter accounts or YouTube influencers. The signal-to-noise on social platforms varies wildly and what is sharp in 2026 will be stale by 2028. Stick with primary sources — papers, official documentation, university course pages — and let curation filter itself over time.

Sources

[01]
Vaswani et al., Attention Is All You Need — the original transformer architecture paper, cited as foundational reading in trunk three.
arxiv.org/abs/1706.03762
[02]
Hoffmann et al., Training Compute-Optimal Large Language Models (Chinchilla) — the compute-optimal scaling law correction cited in the deep learning trunk.
arxiv.org/abs/2203.15556
[03]
Andrej Karpathy's Neural Networks Zero to Hero YouTube series, including the micrograd and nanoGPT lectures referenced multiple times.
youtube.com/@AndrejKarpathy
[04]
fast.ai Practical Deep Learning for Coders, free top-down deep learning course by Jeremy Howard cited as the training-and-tricks resource.
course.fast.ai
[05]
The free Hugging Face NLP and LLM course covering tokenization, fine-tuning, and evaluation.
huggingface.co/learn/nlp-course
[06]
Grant Sanderson's 3Blue1Brown — Essence of Linear Algebra, Essence of Calculus, and Neural Networks playlists used as the visual-math foundations resource.
3blue1brown.com
[07]
Stanford CS231n Convolutional Neural Networks for Visual Recognition course notes and lectures, free.
cs231n.stanford.edu
[08]
Stanford CS229 Machine Learning lecture notes, free reference for classical ML.
cs229.stanford.edu
[09]
Brown University's Seeing Theory, free interactive probability and statistics resource.
seeing-theory.brown.edu
[10]
Google's free Machine Learning Crash Course, used as the evaluation-and-metrics resource in trunk two.
developers.google.com/machine-learning/crash-course
[11]
scikit-learn official documentation, the canonical free reference for classical ML algorithms and cross-validation.
scikit-learn.org
[12]
Kaggle Learn's short free feature engineering course.
kaggle.com/learn/feature-engineering
[13]
Scott Cunningham's Causal Inference: The Mixtape, free online textbook on causal methods.
mixtape.scunning.com
[14]
Anthropic's official prompt engineering documentation, free and maintained by the model provider.
docs.anthropic.com/en/docs/build-with-claude/prompt-engineering
[15]
Anthropic's December 2024 essay 'Building effective agents' cited as a plain-language guide to agent design.
anthropic.com/research/building-effective-agents
[16]
Anthropic's mechanistic interpretability research publication venue, including induction-heads and sparse-autoencoder work.
transformer-circuits.pub
[17]
EleutherAI's lm-evaluation-harness, an open-source LLM evaluation framework cited in trunk four.
github.com/EleutherAI/lm-evaluation-harness
[18]
pgvector, the open-source Postgres extension for vector similarity search.
github.com/pgvector/pgvector
[19]
vLLM documentation, an open-source high-throughput LLM inference server.
docs.vllm.ai
[20]
Neel Nanda's TransformerLens tutorials and writing on mechanistic interpretability, cited as the working-practitioner interp resource.
neelnanda.io
[21]
BlueDot Impact's AGI Safety Fundamentals course, a free structured introduction to alignment.
bluedot.org
[22]
Stanford HAI's annual AI Index Report, neutral aggregated source on the state of AI as of 2026 best-effort.
aiindex.stanford.edu
[23]
NIST AI Risk Management Framework, US government-issued AI risk reference.
nist.gov/itl/ai-risk-management-framework
[24]
METR (Model Evaluation and Threat Research), source for capability and dangerous-capability evaluation writing.
metr.org
[25]
Distill.pub archive — historical gold standard for accessible visual machine learning writing, archive still available.
distill.pub

Keep reading

Learn — playbooks →Learn — index →Research — papers →Tools — index →OrangeBox — local AI build system →B00KMakor — books for builders →Career — index →VS comparisons — pick a stack →

AI competency skill tree

How to read the tree

Trunk one — foundations

Linear algebra

Calculus and gradients

Probability and statistics

Python and scientific stack

Working with data

Reading academic papers

Trunk two — ML fundamentals

Regression and classification

Evaluation and metrics

Cross-validation and leakage

Feature engineering

Tree-based models in practice

Causal thinking

Trunk three — deep learning

Neural network fundamentals

Transformers and attention

Training, optimization, and tricks

Computer vision basics

NLP and language models

Scaling and efficiency

Trunk four — practical AI engineering

Prompting and LLM APIs

Retrieval-augmented generation (RAG)

Agents and tool use

Evaluation harnesses

Vector databases and embeddings

Deployment and observability

Cost and latency engineering

Trunk five — AI product and strategy

Problem selection

Pricing and packaging

Positioning

Honest evaluation in product context

Ethics in product decisions

Communicating to non-technical stakeholders

Trunk six — AI safety and interpretability

Alignment fundamentals

Red-teaming and adversarial evaluation

Mechanistic interpretability

Evals for dangerous capabilities

Policy and governance literacy

Open-weight model risk analysis

The honest specialization table

The minimum effective dose path

Foundations sprint

Classical ML

Deep learning core

Practical engineering

Specialize

Ship and publish

Where most people lose the thread

Resources we do not list and why

Sources

Keep reading