::synthesis · Tim-Ferriss method
Tokens & API costs
::minimum effective dose
Tokens are the unit of payment. They are NOT words, NOT characters — they're subword chunks produced by the model's tokenizer (BPE for GPT/Claude, SentencePiece for Gemini variants). English averages ~4 chars per token, code averages ~2-3, non-Latin scripts can be 1-2 chars per token (which is why Chinese and Arabic queries cost 2-5x more). Two prices to internalize: input tokens (cheap, usually $1-$5 per million) and output tokens (expensive, usually $5-$75 per million). Output is 3-15x more expensive than input on every frontier provider. That's the dominant cost lever — most operators optimize the wrong end. Cache pricing matters: Anthropic prompt caching, OpenAI cached input, Gemini implicit cache all drop the input rate by 50-90% on repeated prefixes. If you re-send the same system prompt across calls, you should be paying the cache rate, not the full rate. The household-level reality: a $20/month consumer ChatGPT or Claude subscription is dramatically cheaper than running the same volume through the API at retail. APIs make sense for automation, integration, and bulk; subscriptions make sense for human-paced interactive use. Don't conflate them.
::DiSSS · deconstruction questions
- 01What's my actual cost-per-task (not cost-per-token) for the workflows I run most often?
- 02Am I paying output rates for things that could be input — e.g., few-shot examples re-typed instead of cached prefix?
- 03Is prompt caching enabled on my calls, and what's my measured cache hit rate?
- 04Which of my tasks are subscription-economic (interactive, human-paced) vs API-economic (batched, automated, bulk)?
- 05What's my model-size right-fit — am I paying Opus prices for a Haiku-grade task?
::fear-setting
Cost of not learning this: your AI bill will quietly become your second-largest software line item without you noticing. Solo operators routinely report 'I thought I was spending $50/month, I checked and it was $1,400' — almost always from a script left running, an agent loop with unbounded context, or a vision model called on every image when a text model would have done. Cost of getting it wrong: existential for unit economics. If you're building a product on top of an LLM and your per-user cost is $0.40 but you're charging $5/month with a power user using $12/month of inference, you have a free-tier abuse vector and a margin death spiral. Founders have shut down launched products inside a quarter for exactly this. Token economics ARE product economics.
::80 / 20 cut
SKIP: comparing tokenizers in depth, the BPE algorithm internals, exact byte-pair-merge rules. OBSESS OVER: (1) the input-vs-output price ratio for your provider — write it on a sticky note, (2) prompt caching configuration — this is free 50-90% discount most operators leave on the table, (3) model-tier right-fitting (Haiku/mini/flash for classification and routing; Opus/4o/Pro only for hard reasoning). Right-fitting alone cuts most operator bills by 70-95%.
::tribe of mentors · paraphrased stances
Ethan Mollick
Wharton professor, author of Co-Intelligence, runs the most-read practical-AI newsletter for non-engineers
Mollick's stance: for individuals, the $20/month frontier subscription is the highest-ROI tool purchase available right now. APIs are for builders. Don't pay API rates to do work a subscription handles.
Simon Willison
Maintains llm CLI, publishes token-cost comparisons across all major providers
Willison's stance: track every dollar. He logs every API call he makes with cost attached and reviews monthly. Most operators have no idea where their spend goes; the few who instrument it cut costs by an order of magnitude on average.
Anthropic / OpenAI pricing teams
Set the actual prices; document them publicly with caching tiers
Provider stance, made explicit in docs: assume you should be using caching. Pricing is structured so non-cached repeated prefixes are the most expensive way to operate. The discount is built in; you just have to claim it.
Hamel Husain
ML engineer, fast.ai contributor, writes detailed cost analyses for production LLM systems
Hamel's stance: most production LLM cost overruns are not model choice — they're failure to evaluate at small scale before launching at large scale. Run 100 representative tasks, multiply by traffic, then decide if the unit economics work.
::real-world test · this week
This week: pick one workflow you run repeatedly (daily summary, weekly report, customer email triage). Measure its current cost per execution from raw API logs. Then run the same task on the cheapest model that plausibly handles it (Haiku, gpt-4o-mini, Flash). Compare quality on 10 instances blind. In 70%+ of operator workflows, the cheap model is indistinguishable for the task. That's a 10-30x cost cut you found in one afternoon.
::action items · ranked
- 01Pull your last 30 days of API spend by provider and segment by model — find the single biggest line item
- 02Enable prompt caching on every workflow that re-sends the same system prompt or examples (most providers, one config flag)
- 03Right-fit one workflow per week to the smallest model that passes blind quality eval on 10 representative tasks
- 04Set a hard monthly spend cap with email alerts at 50/75/90% — every provider supports this; most operators don't enable it
- 05Move human-paced interactive work to a subscription tier; reserve API spend for automation and batch