
AI competency skill tree
Six trunks. Forty-ish skills. Most jobs need 30% of one trunk and 10% of two others.
How to read the tree
Trunk one — foundations
The mathematical and computational substrate. You do not need to be a mathematician, but you do need enough to read a paper without bouncing off the notation. Skim-level fluency in linear algebra, calculus, probability, and Python is usually enough to start.
Linear algebra
Free: 3Blue1Brown — Essence of Linear Algebra
Mastery: you can read a matrix-multiplication expression and predict the output shape without running it. You understand why dot products measure similarity and what a basis change actually does to a vector. Resource: Grant Sanderson's Essence of Linear Algebra YouTube series — fifteen short visual episodes that build intuition before notation. Test: implement matrix multiplication from scratch in NumPy without copying, then explain in plain English what each loop is doing.
Calculus and gradients
Free: 3Blue1Brown — Essence of Calculus
Mastery: you understand backpropagation as the chain rule applied recursively, not as magic. You can compute a simple derivative by hand and recognize why neural net training is gradient descent on a loss surface. Resource: Essence of Calculus, then Karpathy's micrograd lecture which derives backprop from scratch. Test: hand-compute the gradient of a two-layer network on paper for a single example, then verify with PyTorch autograd.
Probability and statistics
Free: Seeing Theory (Brown University)
Mastery: you can read a confusion matrix without panicking. You know the difference between correlation and causation in a working sense, not just a slogan. You understand why a model with ninety-nine percent accuracy can still be useless on imbalanced data. Resource: Seeing Theory at seeing-theory.brown.edu — interactive, free, by Brown's mathematics department. Test: given a test with one-percent base rate, compute the probability that a positive result is a true positive.
Python and scientific stack
Free: Python.org tutorial + NumPy quickstart
Mastery: you read NumPy or PyTorch code and follow shapes without printing. You know what broadcasting does and where it bites you. You can vectorize a loop without thinking about it. Resource: the official Python tutorial at docs.python.org, then NumPy's quickstart at numpy.org/doc. Test: rewrite a triple-nested Python loop as a single vectorized NumPy expression and time the speedup.
Working with data
Free: Pandas official user guide
Mastery: you can load a messy CSV, profile it, find the column with eighteen percent missing values, and decide what to do about it. You understand groupby, merge, and pivot from muscle memory. Resource: the pandas user guide at pandas.pydata.org/docs. Test: take any public dataset (the UCI repository works), load it, and produce three honest observations and one question you cannot yet answer.
Reading academic papers
Free: Karpathy — How to read papers (talks and threads)
Mastery: you can read an arxiv paper's abstract, skim figures, and decide in under ten minutes whether to read it fully. You know that most papers do not survive replication and you read accordingly. Resource: Andrej Karpathy has discussed his paper-reading process in multiple talks and tweets — search for 'Karpathy reading papers' on YouTube. Test: read the Attention Is All You Need paper (arxiv 1706.03762) and write a one-page plain-English summary.
Trunk two — ML fundamentals
Classical machine learning. Most production AI systems still run logistic regression somewhere. Skipping this trunk because 'deep learning won' is a common and expensive mistake.
Regression and classification
Free: scikit-learn documentation + Andrew Ng's CS229
Mastery: you can pick logistic regression, decision trees, or gradient boosting for a tabular problem and justify the choice in one sentence. You understand bias-variance tradeoffs in a working sense. Resource: scikit-learn.org documentation, then Stanford CS229 lecture notes (free at cs229.stanford.edu). Test: on a tabular dataset, beat a logistic regression baseline by at least three percent F1, then explain why your model won.
Evaluation and metrics
Free: Google ML Crash Course
Mastery: you know that accuracy is usually the wrong metric. You can pick precision, recall, F1, AUC, or calibration based on what the model is for, not what is easiest. Resource: Google's Machine Learning Crash Course at developers.google.com/machine-learning/crash-course. Test: build a binary classifier where accuracy is high but recall is unacceptable, and explain the business consequence.
Cross-validation and leakage
Free: scikit-learn cross-validation guide
Mastery: you can spot data leakage in someone else's pipeline. You know why time-series splits are not random splits and why mixing patient IDs across splits invalidates a medical model. Resource: scikit-learn's model_selection documentation. Test: deliberately introduce a leakage bug into a notebook, measure the inflated score, fix it, and document the gap.
Feature engineering
Free: Kaggle Learn — Feature Engineering
Mastery: you can take raw data and produce features that materially improve a model — not just transform-everything spam. You know when to scale, encode, target-encode, or leave alone. Resource: kaggle.com/learn/feature-engineering — short, practical, free. Test: take a public tabular contest dataset, build three feature engineering versions, and document which actually helped on held-out data.
Tree-based models in practice
Free: XGBoost documentation + Catboost docs
Mastery: you reach for gradient-boosted trees first on tabular problems and know why. You understand learning rate, depth, and regularization parameters at a working level. Resource: xgboost.readthedocs.io and catboost.ai/docs. Test: on a public tabular benchmark, beat scikit-learn's default RandomForest by tuning XGBoost or LightGBM, and explain the tuning.
Causal thinking
Free: Causal Inference: The Mixtape (free online)
Mastery: you can name three reasons a model showing 'X predicts Y' does not mean X causes Y. You know the basics of confounders, colliders, and selection bias. Resource: Scott Cunningham's Causal Inference: The Mixtape, free online at mixtape.scunning.com. Test: take a published finding from a popular article, identify a plausible confounder, and write what experiment would test it.
Trunk three — deep learning
Neural networks, transformers, and large-scale training. The most-hyped trunk and the one where free top-tier resources are unusually abundant. The 'micrograd to transformer' path Karpathy laid out is the most under-priced curriculum on the public internet.
Neural network fundamentals
Free: Karpathy — Neural Networks: Zero to Hero
Mastery: you can implement a small MLP from scratch including backpropagation, training loop, and weight initialization. You know what happens to gradients in deep networks and why ReLU exists. Resource: Andrej Karpathy's Zero to Hero series on YouTube (youtube.com/@AndrejKarpathy) — start with the micrograd lecture. Test: build a two-layer network from scratch in NumPy that solves XOR or MNIST above ninety-five percent accuracy.
Transformers and attention
Free: Attention Is All You Need (arxiv 1706.03762) + Karpathy nanoGPT
Mastery: you can draw a transformer block from memory and explain each component. You understand why attention is permutation-invariant and why positional encodings are needed. Resource: read the original Vaswani et al. paper at arxiv.org/abs/1706.03762, then watch Karpathy's 'Let's build GPT' video which implements nanoGPT from scratch. Test: implement a small character-level transformer that generates Shakespeare-ish text after a few hours of training on a laptop.
Training, optimization, and tricks
Free: fast.ai Practical Deep Learning for Coders
Mastery: you know what learning rate finder does, why batch size matters, and what happens when you mismatch dropout and batch normalization. You can debug a training run that is not converging. Resource: course.fast.ai — Jeremy Howard's free practical course, top-down rather than bottom-up. Test: take a public dataset, train a model, hit a non-converging run, and diagnose the failure in writing.
Computer vision basics
Free: Stanford CS231n (course notes free online)
Mastery: you understand convolutions, pooling, and why ResNets work. You know that vision transformers exist and roughly when to pick which architecture. Resource: cs231n.stanford.edu — lecture notes are free; video lectures from older years are on YouTube. Test: fine-tune a pretrained ResNet on a small custom image dataset and report honest accuracy on a held-out test set.
NLP and language models
Free: Hugging Face Course
Mastery: you can tokenize text, run inference with a pretrained model, fine-tune for a downstream task, and evaluate honestly. You know what an embedding is and what BPE does. Resource: huggingface.co/learn/nlp-course — free, hands-on, library-aligned. Test: fine-tune a small open model on a classification task and beat zero-shot prompting by a measurable margin.
Scaling and efficiency
Free: Chinchilla paper (arxiv 2203.15556) + Hoffmann et al.
Mastery: you understand at a conceptual level what compute-optimal training is, why FLOPs and parameter counts are not the same thing, and the rough shape of the scaling-law literature. Resource: the Chinchilla paper, Training Compute-Optimal Large Language Models at arxiv.org/abs/2203.15556. Test: read the paper and explain in three sentences what its main correction to earlier scaling laws was.
Trunk four — practical AI engineering
The trunk most jobs actually pay for in 2026. Building things with existing models — RAG, agents, evaluation harnesses, deploys — rather than training new ones. Heavy on engineering discipline, lighter on math.
Prompting and LLM APIs
Free: Anthropic prompt engineering docs + OpenAI cookbook
Mastery: you can take a vague natural-language task and produce a prompt that hits the target ninety percent of the time, with examples, constraints, and an output format. You know when to use few-shot versus zero-shot. Resource: docs.anthropic.com/en/docs/build-with-claude/prompt-engineering and cookbook.openai.com — both free, both maintained by the model providers. Test: build a prompt that extracts structured data from messy text with above ninety-five percent accuracy on a held-out set.
Retrieval-augmented generation (RAG)
Free: LangChain docs + LlamaIndex docs (concept guides)
Mastery: you can build a retrieval pipeline that improves answer quality over a raw model, evaluate it honestly, and know when retrieval is making things worse. You understand chunking, embeddings, hybrid search, and reranking. Resource: python.langchain.com and docs.llamaindex.ai — focus on the concept guides, not framework lock-in. Test: build a RAG system over a real document corpus and produce a held-out evaluation showing the retrieval actually helped.
Agents and tool use
Free: Anthropic — Building effective agents (Dec 2024 post)
Mastery: you can build a tool-using agent that does not silently fail. You know that most 'agent' demos collapse under real evaluation, and you build accordingly with retries, fallbacks, and human-in-the-loop checkpoints. Resource: Anthropic's 'Building effective agents' essay at anthropic.com/research/building-effective-agents — clearest plain-language guide as of 2026 best-effort. Test: build an agent that completes a real five-step task with above eighty percent success on a held-out evaluation.
Evaluation harnesses
Free: Hugging Face evaluate library + EleutherAI lm-eval-harness
Mastery: you write evals before you write features. You know that 'vibes-based' evaluation does not survive production. You can build a regression test suite for LLM behavior. Resource: github.com/EleutherAI/lm-evaluation-harness and huggingface.co/docs/evaluate. Test: build an eval suite for a specific LLM task that catches a real regression you introduce on purpose.
Vector databases and embeddings
Free: pgvector docs + Sentence-Transformers documentation
Mastery: you understand that embeddings are not magic and that different embedding models give meaningfully different results. You know when to use cosine similarity, when to rerank, and when to use full-text search instead. Resource: github.com/pgvector/pgvector and sbert.net. Test: compare two embedding models on a retrieval task and report which won and by how much, with statistical significance.
Deployment and observability
Free: vLLM documentation + standard MLOps writing
Mastery: you can deploy a model behind an API, monitor latency and cost, and respond to a regression in production. You understand caching, batching, and quantization tradeoffs at a working level. Resource: docs.vllm.ai for inference servers; combine with general SRE writing for observability. Test: deploy a small open-weight model behind an API on a VPS or local box, with logging, and serve a real request.
Cost and latency engineering
Free: provider pricing pages — check provider docs for current pricing
Mastery: you can take a feature spec and produce a cost estimate within twenty percent of actual, before building. You understand prompt caching, batch APIs, model tiers, and when a smaller model is the right call. Resource: anthropic.com/pricing, openai.com/pricing, and equivalent provider docs — pricing changes frequently, always check the current docs. Test: estimate the monthly cost of a feature at a given QPS, build it, and compare the bill to the estimate.
Trunk five — AI product and strategy
The least-respected and most-underrated trunk. Knowing what to build, for whom, and at what price beats knowing how to train another transformer. Most AI failures in 2025 and 2026 were product failures, not technical ones.
Problem selection
Free: Y Combinator startup library + Paul Graham essays
Mastery: you can look at a market and identify three places where AI changes the unit economics, and three places where it does not. You resist the urge to 'add AI' to everything. Resource: ycombinator.com/library and paulgraham.com/articles.html — free, opinionated, useful. Test: write a one-page memo explaining why one specific AI feature would succeed and another would fail, with reasoning that survives a hostile read.
Pricing and packaging
Free: First Round Review pricing essays
Mastery: you can pick between subscription, usage, and outcome-based pricing for a given product and justify it. You understand the math of usage-based pricing under variable token costs. Resource: review.firstround.com — search 'pricing' and 'packaging'; supplement with provider pricing pages. Test: take any AI product idea and produce three pricing models with the trade-offs of each in one page.
Positioning
Free: April Dunford — Obviously Awesome (talks and excerpts free online)
Mastery: you can name three direct competitors to any product you work on and articulate why a customer would pick one over another in one sentence each. Resource: aprildunford.com and her free talks on YouTube. Test: write the positioning statement for a real AI product (yours or a competitor's) using April's five-question template, and have someone unfamiliar with the space rate clarity.
Honest evaluation in product context
Free: Hamel Husain — AI Evals essays
Mastery: you build product metrics that survive scrutiny. You know the difference between 'users liked the demo' and 'users completed the job to be done.' Resource: hamel.dev — Hamel Husain's writing on LLM evaluations is the most cited working-practitioner resource as of 2026 best-effort. Test: define a real product success metric for an AI feature, instrument it, and ship a change measured against it.
Ethics in product decisions
Free: Partnership on AI publications + DeepMind safety blog
Mastery: you can name three classes of harm your product could cause and what mitigations would cost. You do not confuse 'we wrote a policy doc' with 'we changed the product.' Resource: partnershiponai.org/publications and deepmind.google/discover/blog — both publish accessible writing on real product-level harms. Test: produce a one-page risk assessment for a real AI feature, with mitigations costed in engineering time.
Communicating to non-technical stakeholders
Free: Distill.pub archive + Anthropic research blog
Mastery: you can explain a model's behavior to a CEO, a lawyer, and a customer in three different registers without lying. Resource: distill.pub for the gold standard of accessible ML writing (archive remains free); anthropic.com/research for current-state plain-language model writing. Test: take one technical AI concept and write three explanations of it at three different reading levels — executive, technical-but-not-ML, and ML peer.
Trunk six — AI safety and interpretability
The trunk that decides whether the rest of this stack remains trustworthy. Growing fast as a research field; smaller as a job market than the hype suggests but real and growing. Mech interp specifically went from niche to mainstream between 2023 and 2026.
Alignment fundamentals
Free: AGI Safety Fundamentals (BlueDot Impact)
Mastery: you can name the core open problems — specification gaming, mesa-optimization, deceptive alignment — and what current research is doing about them. You do not treat 'alignment' as a slogan. Resource: bluedot.org — the AGI Safety Fundamentals course is free, structured, and respected. Test: read three current alignment papers and produce a one-page synthesis of where the field disagrees with itself.
Red-teaming and adversarial evaluation
Free: Anthropic and OpenAI red-team reports (model cards)
Mastery: you can take a model and find five categories of failure that the developer's eval missed. You know that adversarial inputs do not need to be exotic — most are mundane. Resource: model cards published with frontier model releases (anthropic.com and openai.com publish these); supplement with the lm-eval-harness adversarial suites. Test: take any deployed LLM and find a reproducible failure mode that violates its stated policy.
Mechanistic interpretability
Free: Anthropic interpretability tutorials + Neel Nanda's TransformerLens
Mastery: you understand what a 'feature' means in a sparse autoencoder, what circuits are, and why finding induction heads mattered. You can read a current interpretability paper without bouncing. Resource: transformer-circuits.pub publishes Anthropic's open research; supplement with Neel Nanda's TransformerLens library and tutorials at neelnanda.io. Test: replicate a small interpretability result — for example, find induction heads in a small open model — using TransformerLens.
Evals for dangerous capabilities
Free: METR evaluations writing + Anthropic responsible scaling policy
Mastery: you understand why certain evaluations cannot be public, the difference between capability evaluations and propensity evaluations, and the basic structure of a Responsible Scaling Policy. Resource: metr.org and anthropic.com/news/anthropics-responsible-scaling-policy. Test: read one RSP and produce a one-page critique of where you think the thresholds are too loose or too tight, with reasoning.
Policy and governance literacy
Free: AI Index Report (Stanford HAI, annual, free)
Mastery: you can name the major active regulatory regimes — the EU AI Act, the US executive actions, the UK AISI work — and what each actually requires. Specifics shift; check current sources. Resource: aiindex.stanford.edu — Stanford HAI's annual AI Index Report is the most-cited neutral source as of 2026 best-effort. Test: take one current policy proposal and write a memo on what it would change in a specific AI product's day-to-day operations.
Open-weight model risk analysis
Free: NIST AI Risk Management Framework
Mastery: you understand the trade-offs between releasing weights openly, releasing under license, and keeping closed. You can describe a specific harm model and a specific benefit model without sloganeering. Resource: nist.gov/itl/ai-risk-management-framework — free, government-issued, citable. Test: pick one open-weight model release and produce a one-page net-impact analysis with honest uncertainty intervals.
The honest specialization table
Most working AI roles need depth in one trunk and literacy in two others. Use this as a starting point, not a contract. The percentages are rough working estimates of where attention should go.
The minimum effective dose path
If you are starting cold and want to be useful — not credentialed, useful — here is the leanest sequence we have seen actually work. Twelve to twenty weeks of focused part-time effort. The point is not speed; the point is to find out quickly whether you want to keep going.
Weeks 1-3
Foundations sprint
3Blue1Brown's neural networks and linear algebra playlists. Karpathy's micrograd lecture. Python and NumPy basics from numpy.org. End state: you can implement a one-layer neural network from scratch and explain what backprop is doing.
Weeks 4-6
Classical ML
scikit-learn tutorial. Run logistic regression, decision trees, and gradient boosting on a public tabular dataset. Learn to evaluate with cross-validation and avoid leakage. End state: you can produce honest held-out metrics on a real dataset.
Weeks 7-9
Deep learning core
Karpathy's makemore and nanoGPT lectures. Hugging Face course chapters 1 through 4. Fine-tune a pretrained model on a small task. End state: you have trained, not just inferred, a real model.
Weeks 10-12
Practical engineering
Build a tiny RAG system. Build a small evaluation harness. Deploy something behind an API. End state: you have shipped a working AI feature that someone besides you can use.
Weeks 13-16
Specialize
Pick one trunk to go deep in based on what felt alive during the prior phase. Stop trying to be a generalist. End state: you have a defensible answer when someone asks 'what do you actually do.'
Weeks 17-20
Ship and publish
Build one substantial thing in your chosen specialization and write it up publicly. The write-up matters more than the project. End state: there is something at a URL with your name on it that demonstrates the skill.
Where most people lose the thread
Three predictable failure modes. First: tutorial loop. Watching one more course instead of building one more thing. The bar is whether you have shipped, not whether you have studied. Second: trunk sprawl. Trying to learn all six trunks in parallel and ending up shallow at all of them. Pick one, then borrow from two. Third: hype substitution. Spending time on the model release of the week instead of fundamentals that compound. Models change every six months; matrix multiplication does not. The honest move is to read fewer announcements and finish more projects.
Resources we do not list and why
Sources
- [01]
Vaswani et al., Attention Is All You Need — the original transformer architecture paper, cited as foundational reading in trunk three.
arxiv.org/abs/1706.03762
- [02]
Hoffmann et al., Training Compute-Optimal Large Language Models (Chinchilla) — the compute-optimal scaling law correction cited in the deep learning trunk.
arxiv.org/abs/2203.15556
- [03]
Andrej Karpathy's Neural Networks Zero to Hero YouTube series, including the micrograd and nanoGPT lectures referenced multiple times.
youtube.com/@AndrejKarpathy
- [04]
fast.ai Practical Deep Learning for Coders, free top-down deep learning course by Jeremy Howard cited as the training-and-tricks resource.
course.fast.ai
- [05]
The free Hugging Face NLP and LLM course covering tokenization, fine-tuning, and evaluation.
huggingface.co/learn/nlp-course
- [06]
Grant Sanderson's 3Blue1Brown — Essence of Linear Algebra, Essence of Calculus, and Neural Networks playlists used as the visual-math foundations resource.
3blue1brown.com
- [07]
Stanford CS231n Convolutional Neural Networks for Visual Recognition course notes and lectures, free.
cs231n.stanford.edu
- [08]
Stanford CS229 Machine Learning lecture notes, free reference for classical ML.
cs229.stanford.edu
- [09]
Brown University's Seeing Theory, free interactive probability and statistics resource.
seeing-theory.brown.edu
- [10]
Google's free Machine Learning Crash Course, used as the evaluation-and-metrics resource in trunk two.
developers.google.com/machine-learning/crash-course
- [11]
scikit-learn official documentation, the canonical free reference for classical ML algorithms and cross-validation.
scikit-learn.org
- [12]
Kaggle Learn's short free feature engineering course.
kaggle.com/learn/feature-engineering
- [13]
Scott Cunningham's Causal Inference: The Mixtape, free online textbook on causal methods.
mixtape.scunning.com
- [14]
Anthropic's official prompt engineering documentation, free and maintained by the model provider.
docs.anthropic.com/en/docs/build-with-claude/prompt-engineering
- [15]
Anthropic's December 2024 essay 'Building effective agents' cited as a plain-language guide to agent design.
anthropic.com/research/building-effective-agents
- [16]
Anthropic's mechanistic interpretability research publication venue, including induction-heads and sparse-autoencoder work.
transformer-circuits.pub
- [17]
EleutherAI's lm-evaluation-harness, an open-source LLM evaluation framework cited in trunk four.
github.com/EleutherAI/lm-evaluation-harness
- [18]
pgvector, the open-source Postgres extension for vector similarity search.
github.com/pgvector/pgvector
- [19]
vLLM documentation, an open-source high-throughput LLM inference server.
docs.vllm.ai
- [20]
Neel Nanda's TransformerLens tutorials and writing on mechanistic interpretability, cited as the working-practitioner interp resource.
neelnanda.io
- [21]
BlueDot Impact's AGI Safety Fundamentals course, a free structured introduction to alignment.
bluedot.org
- [22]
Stanford HAI's annual AI Index Report, neutral aggregated source on the state of AI as of 2026 best-effort.
aiindex.stanford.edu
- [23]
NIST AI Risk Management Framework, US government-issued AI risk reference.
nist.gov/itl/ai-risk-management-framework
- [24]
METR (Model Evaluation and Threat Research), source for capability and dangerous-capability evaluation writing.
metr.org
- [25]
Distill.pub archive — historical gold standard for accessible visual machine learning writing, archive still available.
distill.pub