
::calculator · What does your AI task actually cost per year?
AI Cost Calculator
::inputs
Includes system prompt + user message + any context you pass in
What the model writes back. Tends to be smaller than input unless you ask for long generations.
One user-facing task = one set of input + output tokens. Not raw API calls.
June 2026 published prices per 1M tokens. Local-ollama assumes $0 API cost (GPU/electric not included).
::result
Cost per task
—
Cost per month (30-day)
—
Cost per year (365-day)
—
::how this calculates
Each model has a published input price and output price per 1 million tokens. Cost per task = (input tokens times input price + output tokens times output price) divided by 1,000,000. Cost per month = cost per task times tasks per day times 30. Cost per year = cost per task times tasks per day times 365. Local-ollama is hardcoded to $0 at the API line (your GPU and electricity are real costs but live outside this calculator). No volume discount, no batch discount, no prompt caching modeled — this is the gross upper bound.
::worked examples
Customer support chatbot on Claude 3.5 Sonnet
4K context window with retrieved docs, 600-token answer, 500 tickets/day. Sonnet at $3/$15 per 1M. Annual cost lands near $3,920 — defensible at this scale, but watch what happens if traffic 10x's.
Same workload, dropped to Claude 3.5 Haiku
Identical task profile, swapped model. Haiku at $0.80/$4 per 1M cuts annual cost by ~73%. The right move IF Haiku passes your eval on the actual ticket distribution. Run the eval before you swap.
Background classifier on GPT-4o mini
High-volume tagging job — small context, tiny output, 10K runs/day. GPT-4o mini at $0.15/$0.60 keeps it under $550/year. This is the workload class where commercial APIs actually beat self-hosting on total cost.
Long-context document analyzer on Gemini 1.5 Pro
Whole-document ingestion at 80K input tokens — Gemini's long-context strength. 100 docs/day, 2K-token summary. Annual cost lands near $4,015. Watch the long-context tier — Gemini raises prices above 128K input.
::what this does NOT capture
- ○Prices are June 2026 published API prices, no negotiated rate, no enterprise discount, no committed-use credit applied.
- ○Prompt caching, batch API (50% discount on OpenAI/Anthropic), and Gemini's context caching are NOT modeled — they can cut real cost by 30-90% on the right workload.
- ○One 'task' equals one model call. Agents, retries, and multi-step chains can multiply real cost by 3x-10x. Count those separately.
- ○Local-ollama is shown as $0 at the API line. Your GPU, electricity, and operator time are real and out of scope here.
- ○Output tokens are usually billed 3-5x higher than input tokens. The biggest cost lever is almost always 'make the model write less.'
- ○Gemini 1.5 Pro pricing tiers up for >128K input contexts. This calculator uses the base tier; long-context jobs will run higher.
- ○30 days/month and 365 days/year are flat assumptions. Bursty workloads (B2B Mon-Fri only) overestimate by ~30%.
- ○Token counts depend on tokenizer — Claude, GPT, and Gemini tokenize differently. Same English text can produce 5-15% different token counts across providers.