AI Cost Calculator

Most teams ship AI features without ever drawing a line from "tokens per request" to "annual invoice." Then the invoice arrives and someone has to explain why a chatbot costs $40,000 a year, or why moving from Claude 3.5 Sonnet to Haiku quietly saved $28,000 with no quality drop on the actual workload. This calculator closes that gap. You give it three honest numbers — average input tokens per task, average output tokens per task, tasks per day — and a provider/model choice. It returns cost per task, cost per month, and cost per year using realistic published prices as of June 2026. The math is deliberately boring: input tokens times input price plus output tokens times output price, divided by one million, multiplied by run rate. No volume discounts assumed. No batch API discounts assumed. No caching credits assumed. Those things are real and they help — and they are out of scope here because they vary by contract, vary by month, and tempt people to be optimistic about their own usage. This is the upper-bound estimate. The number your CFO should believe before you negotiate it down. A few things the calculator does NOT include, and which you should think about separately. It does not model retries, tool calls, agent loops where one user task fans out into ten model calls, or systems where Claude generates a draft and GPT critiques it. Those multiply real cost by 3x to 10x. It does not include embedding costs, vector store costs, or the human review labor that often dominates the bill. It assumes "local-ollama" is free at the API line — but you are paying for the GPU, the electricity, and the engineer who keeps it running, so call it $0.0001/M tokens of opportunity cost and decide for yourself. Use this as a sanity check before you commit to a model. Run it on three workload sizes (today, 6 months, end of year) and look at the year-end number. If that number scares you, you have a model selection problem, not a usage problem.

::inputs

Average input tokens per tasktokens

Includes system prompt + user message + any context you pass in

Average output tokens per tasktokens

What the model writes back. Tends to be smaller than input unless you ask for long generations.

Tasks per daytasks/day

One user-facing task = one set of input + output tokens. Not raw API calls.

Provider / model

June 2026 published prices per 1M tokens. Local-ollama assumes $0 API cost (GPU/electric not included).

::result

Cost per task

—

Cost per month (30-day)

—

Cost per year (365-day)

—

::how this calculates

Each model has a published input price and output price per 1 million tokens. Cost per task = (input tokens times input price + output tokens times output price) divided by 1,000,000. Cost per month = cost per task times tasks per day times 30. Cost per year = cost per task times tasks per day times 365. Local-ollama is hardcoded to $0 at the API line (your GPU and electricity are real costs but live outside this calculator). No volume discount, no batch discount, no prompt caching modeled — this is the gross upper bound.

::worked examples

Customer support chatbot on Claude 3.5 Sonnet

inputTokens: 4000outputTokens: 600tasksPerDay: 500model: claude-3.5-sonnet

4K context window with retrieved docs, 600-token answer, 500 tickets/day. Sonnet at $3/$15 per 1M. Annual cost lands near $3,920 — defensible at this scale, but watch what happens if traffic 10x's.

Same workload, dropped to Claude 3.5 Haiku

inputTokens: 4000outputTokens: 600tasksPerDay: 500model: claude-3.5-haiku

Identical task profile, swapped model. Haiku at $0.80/$4 per 1M cuts annual cost by ~73%. The right move IF Haiku passes your eval on the actual ticket distribution. Run the eval before you swap.

Background classifier on GPT-4o mini

inputTokens: 800outputTokens: 50tasksPerDay: 10000model: gpt-4o-mini

High-volume tagging job — small context, tiny output, 10K runs/day. GPT-4o mini at $0.15/$0.60 keeps it under $550/year. This is the workload class where commercial APIs actually beat self-hosting on total cost.

Long-context document analyzer on Gemini 1.5 Pro

inputTokens: 80000outputTokens: 2000tasksPerDay: 100model: gemini-1.5-pro

Whole-document ingestion at 80K input tokens — Gemini's long-context strength. 100 docs/day, 2K-token summary. Annual cost lands near $4,015. Watch the long-context tier — Gemini raises prices above 128K input.

::what this does NOT capture

○Prices are June 2026 published API prices, no negotiated rate, no enterprise discount, no committed-use credit applied.
○Prompt caching, batch API (50% discount on OpenAI/Anthropic), and Gemini's context caching are NOT modeled — they can cut real cost by 30-90% on the right workload.
○One 'task' equals one model call. Agents, retries, and multi-step chains can multiply real cost by 3x-10x. Count those separately.
○Local-ollama is shown as $0 at the API line. Your GPU, electricity, and operator time are real and out of scope here.
○Output tokens are usually billed 3-5x higher than input tokens. The biggest cost lever is almost always 'make the model write less.'
○Gemini 1.5 Pro pricing tiers up for >128K input contexts. This calculator uses the base tier; long-context jobs will run higher.
○30 days/month and 365 days/year are flat assumptions. Bursty workloads (B2B Mon-Fri only) overestimate by ~30%.
○Token counts depend on tokenizer — Claude, GPT, and Gemini tokenize differently. Same English text can produce 5-15% different token counts across providers.

← back to /learn·/tools index →