::Tim-Ferriss-style AI synthesis · 14 distillations
The 5-minute version of each AI topic.
Minimum Effective Dose. Fear-setting. DiSSS framework. 80/20 cut. Tribe of Mentors. Each topic distilled to the smallest unit that still moves the operator forward. Ruthless on purpose.
::01
Context windows
A context window is the model's working memory for one turn — every token of system prompt, conversation history, attached files, tool outputs, and the response itself competes for the same fixed budget. When you hit the…
::02
Tokens & API costs
Tokens are the unit of payment. They are NOT words, NOT characters — they're subword chunks produced by the model's tokenizer (BPE for GPT/Claude, SentencePiece for Gemini variants). English averages ~4 chars per token, …
::03
Prompt engineering core (the 80/20)
Strip every 'prompt engineering' course down and what's left is six moves that produce 80% of the gain. (1) Specify the role only when it actually changes behavior — 'you are an expert' is mostly noise on modern models; …
::04
Multi-LLM routing in practice
Multi-LLM routing is the practice of sending different tasks to different models — Claude for long-context and writing, GPT for general reasoning and tools, Gemini for cheap bulk and vision, local models for private and …
::05
Local models (Ollama setup MED)
Local models run on your own hardware — no API calls, no per-token bill, no data leaving the machine. Ollama is the easiest entry point: one installer, one command to pull a model, OpenAI-compatible API on localhost:1143…
::06
RAG vs long-context · when to use which
RAG (Retrieval-Augmented Generation) and long-context are two solutions to the same problem: getting external knowledge into a model's working memory. They have different tradeoffs, and the operator decision is mostly em…
::07
Agents · the trapdoor
An agent is an LLM in a loop with tools. The loop: model proposes an action (call a tool, write a file, hit an API), system executes it, result feeds back into the next prompt, repeat until the model says done or a max-s…
::08
Embeddings (semantic search MED)
An embedding is a vector — a list of typically 384 to 3,072 floating-point numbers — that represents the semantic meaning of a piece of text. Two texts about similar concepts have vectors that point in similar directions…
::09
Voice cloning ethics + practical
Voice cloning is now a 30-second technology — capture 30 seconds of clean audio of someone's voice, paste it into ElevenLabs, Resemble, Play.ht, or a local model like XTTS or F5-TTS, generate arbitrary speech in that voi…
::10
Vision models (when they help vs distract)
Vision models — Claude with vision, GPT-4o, Gemini multimodal — can accept images as input and reason about their contents. The capability is real: OCR (especially handwritten and structured documents), chart interpretat…
::11
Fine-tuning (when it's worth it · almost never for individuals)
Fine-tuning is training a base model on your specific data to bias its outputs toward your domain, style, or task. It's the most over-recommended and under-justified technique in practical AI. The honest reality: for 95%…
::12
AI safety for practitioners (the day-to-day)
Day-to-day AI safety is not the existential-risk debate; it's the practical operating discipline that prevents you from causing real harm with the LLM systems you're shipping right now. The MED has seven layers, in prior…
::13
Speed of iteration (the operator advantage)
Speed of iteration is the single biggest competitive advantage solo operators and small teams have over large organizations using AI right now. The dynamic: large orgs have approval chains, procurement cycles, security r…
::14
AI economics (the household-level reality)
AI economics at the household level — what does a person, family, or solo operator actually spend, save, and earn from AI in a normal month — is genuinely different from the enterprise discourse and rarely discussed hone…