
::calculator · Side-by-side cost math for two LLM endpoints
Model Comparator
::inputs
First model to price
Second model to price
Total API calls per month across all users
System prompt + user message + retrieved context per call
Typical model response length
Auto-fills from Model A selection; override for negotiated rates
Auto-fills from Model A selection; override for negotiated rates
Auto-fills from Model B selection; override for negotiated rates
::result
Model A monthly cost
$120.00
Model B monthly cost
$87.50
Monthly savings (cheaper model)
$32.50
Annualized savings
$390.00
::how this calculates
For each model, the monthly cost is calculated as queries × (input tokens × input price per million + output tokens × output price per million) ÷ 1,000,000. Model A uses the explicit priceInA and priceOutA inputs; Model B's input price uses priceInB and its output price is approximated as 4× the input price (the median input:output spread across the six listed models in June 2026 — accurate within ~20% for the menu but should be overridden by the live spec on the page for production use). Savings is the absolute difference between the two monthly totals; whichever model is cheaper, that's the saved amount.
::worked examples
Customer-support chatbot, 10K queries/mo: Claude 3.5 Sonnet vs GPT-4o
Mid-volume support workload. Claude 3.5 Sonnet runs $120/mo ($45 input + $75 output); GPT-4o runs $87.50/mo ($37.50 input + $50 output). GPT-4o saves ~$32.50/mo, or ~$390/yr — small enough that quality on your eval set should decide.
RAG product, 100K queries/mo: Gemini 1.5 Pro vs DeepSeek V3
Heavy-context retrieval workload at scale. Gemini 1.5 Pro hits $2,240/mo ($1,400 input + $840 output); DeepSeek V3 lands near $194/mo. Savings of ~$2,046/mo, ~$24.5K/yr — but DeepSeek's eval pass rate on your domain may not justify the swap. Run the suite first.
High-volume content generation, 1M queries/mo: Llama 3.1 70B vs Mistral Large 2
Bulk generation, output-heavy. Llama 3.1 70B totals $1,420/mo ($472 input + $948 output); Mistral Large 2 runs $11,200/mo. Llama saves ~$9,780/mo, ~$117K/yr. At this volume the cost gap is decisive — only Mistral-specific quality wins can justify the spend.
::what this does NOT capture
- ○List prices reflect publicly posted June 2026 rates direct from each provider. Negotiated enterprise rates, Bedrock/Vertex/Azure markups, and prompt-caching discounts are not modeled.
- ○Output price for Model B is approximated as 4× its input price (median across the six-model menu). This is accurate within ~20% for the listed models but should be replaced with explicit per-model output pricing in production.
- ○Token counts are user-supplied averages. Real workloads have heavy tails — p99 calls can be 5-10× the mean, and a few runaway long-context calls can dominate monthly spend.
- ○No infrastructure overhead is included: gateway costs, retry storms, evaluation/test traffic, embedding/vector-DB spend, and observability fees can add 10-30% on top.
- ○Tokenizer differences are ignored. The same English string tokenizes to different counts across BPE variants — typically within ±15% for the listed models, but worth measuring on your actual corpus.
- ○Quality is not modeled. A 10× cheaper model that fails 40% of your evals is functionally more expensive than the headline number; this tool only answers the cost question, not the decision.
- ○Failed requests, rate-limit retries, and partial completions are billed by most providers; this calculator assumes 100% success and ignores retry overhead.
- ○Pricing snapshots drift. The June 2026 numbers were accurate at write time but providers re-price quarterly — verify against the current provider docs before committing to a contract.