::Tim-Ferriss-style AI synthesis · 14 distillations

The 5-minute version of each AI topic.

Minimum Effective Dose. Fear-setting. DiSSS framework. 80/20 cut. Tribe of Mentors. Each topic distilled to the smallest unit that still moves the operator forward. Ruthless on purpose.

::01

Context windows

A context window is the model's working memory for one turn — every token of system prompt, conversation history, attached files, tool outputs, and the response itself competes for the same fixed budget. When you hit the…

::02

Tokens & API costs

Tokens are the unit of payment. They are NOT words, NOT characters — they're subword chunks produced by the model's tokenizer (BPE for GPT/Claude, SentencePiece for Gemini variants). English averages ~4 chars per token, …

::03

Prompt engineering core (the 80/20)

Strip every 'prompt engineering' course down and what's left is six moves that produce 80% of the gain. (1) Specify the role only when it actually changes behavior — 'you are an expert' is mostly noise on modern models; …

::04

Multi-LLM routing in practice

Multi-LLM routing is the practice of sending different tasks to different models — Claude for long-context and writing, GPT for general reasoning and tools, Gemini for cheap bulk and vision, local models for private and …

::05

Local models (Ollama setup MED)

Local models run on your own hardware — no API calls, no per-token bill, no data leaving the machine. Ollama is the easiest entry point: one installer, one command to pull a model, OpenAI-compatible API on localhost:1143…

::06

RAG vs long-context · when to use which

RAG (Retrieval-Augmented Generation) and long-context are two solutions to the same problem: getting external knowledge into a model's working memory. They have different tradeoffs, and the operator decision is mostly em…

::07

Agents · the trapdoor

An agent is an LLM in a loop with tools. The loop: model proposes an action (call a tool, write a file, hit an API), system executes it, result feeds back into the next prompt, repeat until the model says done or a max-s…

::08

Embeddings (semantic search MED)

An embedding is a vector — a list of typically 384 to 3,072 floating-point numbers — that represents the semantic meaning of a piece of text. Two texts about similar concepts have vectors that point in similar directions…

::09

Voice cloning ethics + practical

Voice cloning is now a 30-second technology — capture 30 seconds of clean audio of someone's voice, paste it into ElevenLabs, Resemble, Play.ht, or a local model like XTTS or F5-TTS, generate arbitrary speech in that voi…

::10

Vision models (when they help vs distract)

Vision models — Claude with vision, GPT-4o, Gemini multimodal — can accept images as input and reason about their contents. The capability is real: OCR (especially handwritten and structured documents), chart interpretat…

::11

Fine-tuning (when it's worth it · almost never for individuals)

Fine-tuning is training a base model on your specific data to bias its outputs toward your domain, style, or task. It's the most over-recommended and under-justified technique in practical AI. The honest reality: for 95%…

::12

AI safety for practitioners (the day-to-day)

Day-to-day AI safety is not the existential-risk debate; it's the practical operating discipline that prevents you from causing real harm with the LLM systems you're shipping right now. The MED has seven layers, in prior…

::13

Speed of iteration (the operator advantage)

Speed of iteration is the single biggest competitive advantage solo operators and small teams have over large organizations using AI right now. The dynamic: large orgs have approval chains, procurement cycles, security r…

::14

AI economics (the household-level reality)

AI economics at the household level — what does a person, family, or solo operator actually spend, save, and earn from AI in a normal month — is genuinely different from the enterprise discourse and rarely discussed hone…

← back to /learn