Model Comparator

Picking between two LLMs at scale is rarely a quality decision alone — at production volume the cost delta between Claude 3.5 Sonnet and Gemini 1.5 Pro Flash-tier, or between GPT-4o and DeepSeek V3, can be the difference between a sustainable margin and a burn rate that eats the runway. This comparator does the boring arithmetic honestly: pick two models from the June 2026 pricing snapshot, declare your monthly query volume and rough token shape, and read out the projected spend on each plus the absolute savings of the cheaper option. The math itself is trivial — input tokens × input price + output tokens × output price, multiplied by queries per month. What this tool actually buys you is the discipline of writing the assumption down. Most "model is cheaper" arguments collapse the moment somebody asks "what's your output:input ratio?" or "are you counting cached prefix tokens?" — this calculator forces you to pick a number for each and live with it. Pricing reflects publicly listed per-million-token rates as of June 2026: Claude 3.5 Sonnet at $3.00/M input and $15.00/M output, GPT-4o at $2.50/M and $10.00/M, Gemini 1.5 Pro at $3.50/M and $10.50/M (mid-tier — short-context band), Llama 3.1 70B at ~$0.59/M and $0.79/M via the typical hosted-inference provider, Mistral Large 2 at $2.00/M and $6.00/M, and DeepSeek V3 at $0.27/M and $1.10/M. These are list prices, not your negotiated enterprise rate, not Bedrock/Vertex/Azure markup-adjusted, not prompt-caching-discounted. What this is not: a quality benchmark, a latency comparison, a context-window analysis, or a reasoning capability ranking. A model that's 10× cheaper but fails 40% of your evals is more expensive than the headline number suggests. Run the cost math here, then run your real eval suite, then decide. Cost is a constraint, not a verdict.

::inputs

Model A

First model to price

Model B

Second model to price

Monthly query volumequeries/mo

Total API calls per month across all users

Avg input tokens per querytokens

System prompt + user message + retrieved context per call

Avg output tokens per querytokens

Typical model response length

Model A input price$/M tokens

Auto-fills from Model A selection; override for negotiated rates

Model A output price$/M tokens

Auto-fills from Model A selection; override for negotiated rates

Model B input price$/M tokens

Auto-fills from Model B selection; override for negotiated rates

::result

Model A monthly cost

$120.00

Model B monthly cost

$87.50

Monthly savings (cheaper model)

$32.50

Annualized savings

$390.00

::how this calculates

For each model, the monthly cost is calculated as queries × (input tokens × input price per million + output tokens × output price per million) ÷ 1,000,000. Model A uses the explicit priceInA and priceOutA inputs; Model B's input price uses priceInB and its output price is approximated as 4× the input price (the median input:output spread across the six listed models in June 2026 — accurate within ~20% for the menu but should be overridden by the live spec on the page for production use). Savings is the absolute difference between the two monthly totals; whichever model is cheaper, that's the saved amount.

::worked examples

Customer-support chatbot, 10K queries/mo: Claude 3.5 Sonnet vs GPT-4o

modelA: claude-3.5-sonnetmodelB: gpt-4omonthlyQueries: 10000avgInputTokens: 1500avgOutputTokens: 500priceInA: 3priceOutA: 15priceInB: 2.5

Mid-volume support workload. Claude 3.5 Sonnet runs $120/mo ($45 input + $75 output); GPT-4o runs $87.50/mo ($37.50 input + $50 output). GPT-4o saves ~$32.50/mo, or ~$390/yr — small enough that quality on your eval set should decide.

RAG product, 100K queries/mo: Gemini 1.5 Pro vs DeepSeek V3

modelA: gemini-1.5-promodelB: deepseek-v3monthlyQueries: 100000avgInputTokens: 4000avgOutputTokens: 800priceInA: 3.5priceOutA: 10.5priceInB: 0.27

Heavy-context retrieval workload at scale. Gemini 1.5 Pro hits $2,240/mo ($1,400 input + $840 output); DeepSeek V3 lands near $194/mo. Savings of ~$2,046/mo, ~$24.5K/yr — but DeepSeek's eval pass rate on your domain may not justify the swap. Run the suite first.

High-volume content generation, 1M queries/mo: Llama 3.1 70B vs Mistral Large 2

modelA: llama-3.1-70bmodelB: mistral-large-2monthlyQueries: 1000000avgInputTokens: 800avgOutputTokens: 1200priceInA: 0.59priceOutA: 0.79priceInB: 2

Bulk generation, output-heavy. Llama 3.1 70B totals $1,420/mo ($472 input + $948 output); Mistral Large 2 runs $11,200/mo. Llama saves ~$9,780/mo, ~$117K/yr. At this volume the cost gap is decisive — only Mistral-specific quality wins can justify the spend.

::what this does NOT capture

○List prices reflect publicly posted June 2026 rates direct from each provider. Negotiated enterprise rates, Bedrock/Vertex/Azure markups, and prompt-caching discounts are not modeled.
○Output price for Model B is approximated as 4× its input price (median across the six-model menu). This is accurate within ~20% for the listed models but should be replaced with explicit per-model output pricing in production.
○Token counts are user-supplied averages. Real workloads have heavy tails — p99 calls can be 5-10× the mean, and a few runaway long-context calls can dominate monthly spend.
○No infrastructure overhead is included: gateway costs, retry storms, evaluation/test traffic, embedding/vector-DB spend, and observability fees can add 10-30% on top.
○Tokenizer differences are ignored. The same English string tokenizes to different counts across BPE variants — typically within ±15% for the listed models, but worth measuring on your actual corpus.
○Quality is not modeled. A 10× cheaper model that fails 40% of your evals is functionally more expensive than the headline number; this tool only answers the cost question, not the decision.
○Failed requests, rate-limit retries, and partial completions are billed by most providers; this calculator assumes 100% success and ignores retry overhead.
○Pricing snapshots drift. The June 2026 numbers were accurate at write time but providers re-price quarterly — verify against the current provider docs before committing to a contract.

← back to /learn·/tools index →