built throughORANGEBOX·see what it ships·$1 →
A row of seven matte-black machined cylinders of varying heights — every shipping AI model.

AtomEons / Learn / models

The shipping AI model tracker

Every frontier and open-weight model in production, as of June 2026 — honest about uncertainty.

This page is an index, not a review. It catalogues the AI models that were shipping or in public preview as of the page's last refresh in June 2026 — with their parameter counts where disclosed, context windows, modalities, price per million tokens, release dates, open-weight status, and licenses. Each row links back to the provider's own documentation, because pricing and context windows change more often than blog posts admit. A note on epistemic honesty up front. Frontier-model release schedules are noisy. Providers rename SKUs, change pricing tiers, deprecate snapshots, and ship "preview" models that may or may not graduate to general availability. Where we cite a release date, it is the public-availability date the provider announced — not a research-paper date or a leak date. Where parameter counts are not published (Anthropic, OpenAI, Google's frontier tier), we say so rather than reciting analyst guesses. Where we say "as of June 2026 cutoff," we mean the row was true to the best of our knowledge at that point and the operator should re-check the provider page before quoting it. The closed-weight tier — Anthropic, OpenAI, Google's frontier Gemini, xAI's frontier — does not publish architecture details. The open-weight tier — Meta Llama, Mistral, DeepSeek, Qwen, Microsoft Phi, Google Gemma — does. That difference is the through-line of the modern model economy: open weights compound faster on benchmarks because anyone can fine-tune them, while closed weights compound faster on agentic capability because the lab owns the post-training stack. We organize the page by tier (frontier closed, frontier open, specialized, deprecated-or-retired) rather than alphabetically by lab, because that is how a builder actually shops. Below the tables, we include the open-source heroes section, a pricing-discipline callout, a release timeline, and a how-to-keep-this-current note. No invented model names. No invented prices. No invented URLs.

How to read this tracker

A few conventions that apply to every table below. Skim these once, then the rows make sense at a glance.

  • Params — Listed only where the lab has officially disclosed the figure. For Anthropic, OpenAI, and frontier Gemini, this column reads 'not disclosed' on purpose. We do not repeat third-party guesses.
  • Context window — Maximum total tokens (input plus output) the API will accept for the model's standard tier. Some providers sell longer-context variants at higher prices; we note those inline.
  • Modalities — Text in / text out is the baseline. We mark vision, audio, image generation, and tool use where the model supports them natively (not via a wrapper).
  • Price per 1M tokens — Standard public API pricing in USD as of the page's June 2026 refresh. Input and output are listed separately because the gap matters for any agentic workload. Batch-API and cached-input discounts are footnoted, not in the main number.
  • Release date — First public general-availability date from the provider's own announcement. Preview/beta dates are noted as such.
  • Open weights — Yes means downloadable from Hugging Face (or the lab's own host) under a license that permits local inference. 'Yes, gated' means the weights ship but the license restricts commercial use or requires acceptance.
  • License — The exact license name. The fine print on Llama, Gemma, and DeepSeek is non-trivial; we link to the actual license file, not a summary.

Frontier closed-weight models

These are the labs that do not ship weights. Pricing is API-only. Parameter counts are not disclosed by any of these providers and we do not estimate them. All dates and prices were verified against the provider's own documentation at the page's June 2026 refresh — re-check before quoting in a contract.

ProviderAnthropic
ModelClaude 3.5 Sonnet (2024-10-22)
Context200k
Modalitiestext, vision, tools
Input $/1M$3.00
Output $/1M$15.00
Released2024-10
LicenseAnthropic API ToS
ProviderAnthropic
ModelClaude 3.5 Haiku
Context200k
Modalitiestext, vision, tools
Input $/1M$0.80
Output $/1M$4.00
Released2024-11
LicenseAnthropic API ToS
ProviderAnthropic
ModelClaude Opus 4
Context200k
Modalitiestext, vision, tools, code
Input $/1M$15.00
Output $/1M$75.00
Released2025-05
LicenseAnthropic API ToS
ProviderAnthropic
ModelClaude Sonnet 4
Context200k (1M beta)
Modalitiestext, vision, tools, code
Input $/1M$3.00
Output $/1M$15.00
Released2025-05
LicenseAnthropic API ToS
ProviderOpenAI
ModelGPT-4o
Context128k
Modalitiestext, vision, audio in/out
Input $/1M$2.50
Output $/1M$10.00
Released2024-05
LicenseOpenAI API ToS
ProviderOpenAI
ModelGPT-4o-mini
Context128k
Modalitiestext, vision
Input $/1M$0.15
Output $/1M$0.60
Released2024-07
LicenseOpenAI API ToS
ProviderOpenAI
Modelo1
Context200k
Modalitiestext, vision, reasoning
Input $/1M$15.00
Output $/1M$60.00
Released2024-12
LicenseOpenAI API ToS
ProviderOpenAI
Modelo1-mini
Context128k
Modalitiestext, reasoning
Input $/1M$3.00
Output $/1M$12.00
Released2024-09
LicenseOpenAI API ToS
ProviderOpenAI
Modelo3
Context200k
Modalitiestext, vision, reasoning, tools
Input $/1M$10.00
Output $/1M$40.00
Released2025-04
LicenseOpenAI API ToS
ProviderOpenAI
Modelo3-mini
Context200k
Modalitiestext, reasoning, tools
Input $/1M$1.10
Output $/1M$4.40
Released2025-01
LicenseOpenAI API ToS
ProviderOpenAI
Modelo4-mini
Context200k
Modalitiestext, vision, reasoning, tools
Input $/1M$1.10
Output $/1M$4.40
Released2025-04
LicenseOpenAI API ToS
ProviderGoogle
ModelGemini 2.5 Pro
Context1M (2M waitlisted)
Modalitiestext, vision, audio, code
Input $/1M$1.25 (≤200k) / $2.50 (>200k)
Output $/1M$10.00 / $15.00
Released2025-03 preview, GA 2025-06
LicenseGoogle AI Terms
ProviderGoogle
ModelGemini 2.5 Flash
Context1M
Modalitiestext, vision, audio
Input $/1M$0.30
Output $/1M$2.50
Released2025-04
LicenseGoogle AI Terms
ProviderGoogle
ModelGemini 1.5 Pro
Context2M
Modalitiestext, vision, audio
Input $/1M$1.25 / $2.50
Output $/1M$5.00 / $10.00
Released2024-02 preview, GA 2024-09
LicenseGoogle AI Terms
ProviderxAI
ModelGrok 3
Context1M (advertised)
Modalitiestext, vision, tools
Input $/1M$3.00
Output $/1M$15.00
Released2025-02
LicensexAI API ToS
ProviderxAI
ModelGrok 2
Context131k
Modalitiestext, vision
Input $/1M$2.00
Output $/1M$10.00
Released2024-08
LicensexAI API ToS
ProviderCohere
ModelCommand R+ (2024-08)
Context128k
Modalitiestext, tools, RAG-tuned
Input $/1M$2.50
Output $/1M$10.00
Released2024-08
LicenseCohere ToS (weights available, non-commercial CC-BY-NC 4.0)
ProviderCohere
ModelCommand R7B
Context128k
Modalitiestext, tools
Input $/1M$0.0375
Output $/1M$0.15
Released2024-12
LicenseCohere ToS
ProviderAI21
ModelJamba 1.6 Large
Context256k
Modalitiestext
Input $/1M$2.00
Output $/1M$8.00
Released2025-03
LicenseJamba Open Model License
ProviderAI21
ModelJamba 1.6 Mini
Context256k
Modalitiestext
Input $/1M$0.20
Output $/1M$0.40
Released2025-03
LicenseJamba Open Model License

On 'GPT-5', 'Claude 4.5', 'Gemini 3', and other models that may or may not exist by the time you read this

As of this page's June 2026 refresh, none of the three frontier labs had publicly released a model under the names 'GPT-5,' 'Claude 4.5,' or 'Gemini 3' on their official pricing pages. There were leaks, demos, and 'preview' SKUs in various states of availability, and the labs change their naming conventions frequently — OpenAI in particular has signalled a possible unification of the GPT and o-series lines. If you have arrived at this page after such a release, this table is stale, and the right move is to open the provider's pricing page directly. We will not invent SKUs to look current. The pattern across 2024-2026: every frontier launch we have seen ships first as a 'preview' with API access for paying customers, then graduates to GA within 4-12 weeks, then gets a cheaper sibling within a quarter. Apply that template to whatever you find on the live pricing page.

Frontier open-weight models

These are the models you can actually download and run on your own hardware (or rent from a third-party host like Together, Fireworks, Groq, or Cerebras). Pricing is hosted-API pricing from the model's primary commercial host, where available — but the point of this tier is that you do not have to use any host at all. All have permissive-enough licenses for most builder use cases; read the fine print on the Llama and DeepSeek licenses before shipping in production.

ProviderMeta
ModelLlama 3.1 8B Instruct
Params8B
Context128k
Modalitiestext
Indicative hosted $/1M in/out$0.18 / $0.18 (Together)
Released2024-07
LicenseLlama 3.1 Community License
ProviderMeta
ModelLlama 3.1 70B Instruct
Params70B
Context128k
Modalitiestext
Indicative hosted $/1M in/out$0.88 / $0.88 (Together)
Released2024-07
LicenseLlama 3.1 Community License
ProviderMeta
ModelLlama 3.1 405B Instruct
Params405B
Context128k
Modalitiestext
Indicative hosted $/1M in/out$3.50 / $3.50 (Together)
Released2024-07
LicenseLlama 3.1 Community License
ProviderMeta
ModelLlama 3.2 11B Vision Instruct
Params11B
Context128k
Modalitiestext, vision
Indicative hosted $/1M in/out$0.18 / $0.18
Released2024-09
LicenseLlama 3.2 Community License
ProviderMeta
ModelLlama 3.2 90B Vision Instruct
Params90B
Context128k
Modalitiestext, vision
Indicative hosted $/1M in/out$1.20 / $1.20
Released2024-09
LicenseLlama 3.2 Community License
ProviderMeta
ModelLlama 3.3 70B Instruct
Params70B
Context128k
Modalitiestext
Indicative hosted $/1M in/out$0.88 / $0.88
Released2024-12
LicenseLlama 3.3 Community License
ProviderMeta
ModelLlama 4 Scout (mixture of experts)
Params109B (17B active)
Context10M (advertised)
Modalitiestext, vision
Indicative hosted $/1M in/out$0.18 / $0.59 (Groq)
Released2025-04
LicenseLlama 4 Community License
ProviderMeta
ModelLlama 4 Maverick (mixture of experts)
Params400B (17B active)
Context1M (advertised)
Modalitiestext, vision
Indicative hosted $/1M in/out$0.27 / $0.85
Released2025-04
LicenseLlama 4 Community License
ProviderMistral
ModelMistral Large 2 (2407)
Params123B
Context128k
Modalitiestext, tools
Indicative hosted $/1M in/out$2.00 / $6.00 (la Plateforme)
Released2024-07
LicenseMistral Research License (commercial via paid tier)
ProviderMistral
ModelMistral Small 3
Params24B
Context32k
Modalitiestext, tools
Indicative hosted $/1M in/out$0.10 / $0.30
Released2025-01
LicenseApache 2.0
ProviderMistral
ModelMixtral 8x22B Instruct
Params141B (39B active)
Context65k
Modalitiestext
Indicative hosted $/1M in/out$1.20 / $1.20 (Together)
Released2024-04
LicenseApache 2.0
ProviderMistral
ModelCodestral 25.01
Params22B
Context256k
Modalitiescode
Indicative hosted $/1M in/out$0.30 / $0.90
Released2025-01
LicenseMistral AI Non-Production License (paid for commercial)
ProviderMistral
ModelPixtral Large
Params124B
Context128k
Modalitiestext, vision
Indicative hosted $/1M in/out$2.00 / $6.00
Released2024-11
LicenseMistral Research License
ProviderDeepSeek
ModelDeepSeek-V3
Params671B (37B active)
Context64k (128k via context-cache)
Modalitiestext, code
Indicative hosted $/1M in/out$0.27 / $1.10 (DeepSeek API)
Released2024-12
LicenseDeepSeek License v1.0 (commercial permitted)
ProviderDeepSeek
ModelDeepSeek-R1
Params671B (37B active)
Context64k
Modalitiestext, reasoning
Indicative hosted $/1M in/out$0.55 / $2.19
Released2025-01
LicenseMIT (weights), DeepSeek License (outputs OK for distillation)
ProviderDeepSeek
ModelDeepSeek-R1-Distill-Llama-70B
Params70B
Context128k
Modalitiestext, reasoning
Indicative hosted $/1M in/outvaries by host
Released2025-01
LicenseLlama 3.3 Community License
ProviderAlibaba (Qwen)
ModelQwen2.5 72B Instruct
Params72B
Context131k
Modalitiestext
Indicative hosted $/1M in/out$0.90 / $0.90
Released2024-09
LicenseQwen License (commercial permitted)
ProviderAlibaba (Qwen)
ModelQwen2.5 Coder 32B Instruct
Params32B
Context131k
Modalitiescode
Indicative hosted $/1M in/out$0.80 / $0.80
Released2024-11
LicenseApache 2.0
ProviderAlibaba (Qwen)
ModelQwen2.5-VL 72B Instruct
Params72B
Context131k
Modalitiestext, vision
Indicative hosted $/1M in/out$1.20 / $1.20
Released2025-01
LicenseQwen License
ProviderAlibaba (Qwen)
ModelQwQ-32B-Preview
Params32B
Context32k
Modalitiestext, reasoning
Indicative hosted $/1M in/out$0.15 / $0.60
Released2024-11
LicenseApache 2.0
ProviderAlibaba (Qwen)
ModelQwen3 (preview series)
Params0.6B to 235B (MoE variants)
Context128k
Modalitiestext, reasoning
Indicative hosted $/1M in/outvaries
Released2025-Q2 preview
LicenseApache 2.0 (announced)
ProviderMicrosoft
ModelPhi-4
Params14B
Context16k
Modalitiestext
Indicative hosted $/1M in/out$0.07 / $0.14
Released2024-12
LicenseMIT
ProviderMicrosoft
ModelPhi-3.5-MoE
Params42B (6.6B active)
Context128k
Modalitiestext
Indicative hosted $/1M in/outvaries
Released2024-08
LicenseMIT
ProviderGoogle
ModelGemma 2 27B Instruct
Params27B
Context8k
Modalitiestext
Indicative hosted $/1M in/out$0.27 / $0.27
Released2024-06
LicenseGemma Terms of Use
ProviderGoogle
ModelGemma 2 9B Instruct
Params9B
Context8k
Modalitiestext
Indicative hosted $/1M in/out$0.20 / $0.20
Released2024-06
LicenseGemma Terms of Use
ProviderGoogle
ModelGemma 3 (multi-size)
Params1B / 4B / 12B / 27B
Context128k (12B/27B)
Modalitiestext, vision (4B+)
Indicative hosted $/1M in/out$0.10-$0.30
Released2025-03
LicenseGemma Terms of Use
ProviderAllen AI
ModelOLMo 2 32B Instruct
Params32B
Context4k
Modalitiestext
Indicative hosted $/1M in/outvaries by host
Released2025-03
LicenseApache 2.0 (fully open: weights + data + training code)
ProviderIBM
ModelGranite 3.1 8B Instruct
Params8B
Context128k
Modalitiestext, code
Indicative hosted $/1M in/out$0.10 / $0.10
Released2024-12
LicenseApache 2.0
ProviderNVIDIA
ModelNemotron-4 340B Instruct
Params340B
Context4k
Modalitiestext
Indicative hosted $/1M in/outvaries
Released2024-06
LicenseNVIDIA Open Model License (commercial permitted, no synthetic-data restrictions)

Specialized and embedded models

These are models that are not trying to be a generalist chat assistant. They are tuned for one job — code completion, embeddings, image generation, speech, or on-device inference — and the right shopping question is 'best in class for this specific narrow task,' not 'best overall.'

Code-completion (IDE-grade)

Open + closed

Mistral Codestral 25.01 (22B, 256k context) is the open-weight reference; Qwen2.5-Coder 32B (Apache 2.0) is the strongest fully-permissive option. Closed-tier: Anthropic Claude Sonnet 4 and OpenAI GPT-4o lead the SWE-bench Verified leaderboard as of early 2026. DeepSeek-Coder-V2 (236B MoE, 21B active) remains competitive at a fraction of the cost.

Embeddings

Retrieval & semantic search

OpenAI text-embedding-3-large ($0.13/1M tokens, 3072 dims), Cohere Embed v3 ($0.10/1M, multilingual), Voyage-3 ($0.06/1M, retrieval-tuned), and the open BGE-M3 + Nomic Embed family. Pick by language coverage, dimension count, and whether you need binary/int8 quantization for vector-DB cost.

Image generation

Diffusion + autoregressive

Black Forest Labs FLUX.1 (dev/schnell open weights, pro API), Stability SD 3.5 Large (open), OpenAI DALL-E 3 (closed), Google Imagen 3 (closed), Midjourney v6.1 (closed, no API). FLUX.1 [dev] under FLUX.1 [dev] Non-Commercial License is the de facto open default.

Speech-to-text

ASR

OpenAI Whisper Large v3 remains the open-weight baseline (MIT, 1550M params). Deepgram Nova-2, AssemblyAI Universal-2, and ElevenLabs Scribe are the closed alternatives. NVIDIA Parakeet-TDT 1.1B is the leading open-weight ASR on the Open ASR Leaderboard.

Text-to-speech

TTS

ElevenLabs (closed, voice-cloning leader), OpenAI tts-1-hd ($30/1M chars), Google Chirp/Cloud TTS. Open: Coqui XTTS-v2 (CPML license), Bark by Suno (MIT, lower fidelity). For real-time / low-latency, OpenAI's Realtime API and Cartesia Sonic dominate.

On-device (laptop, phone)

≤8B params, ≤4GB quantized

Llama 3.2 1B/3B (mobile-targeted, 128k context), Phi-4 14B (MIT, runs quantized on 16GB Mac), Gemma 3 4B (vision-capable), Qwen2.5 3B / 7B, Mistral Small 3 24B (workstation tier). Inference via llama.cpp, MLX (Apple), or Ollama.

Reasoning (test-time compute)

Long chain-of-thought

Closed: OpenAI o1 / o3 / o4-mini, Anthropic Claude Sonnet 4 extended thinking, Google Gemini 2.5 Pro Deep Think (preview), DeepSeek-R1 (open). Pattern: pay more output tokens, get harder math / harder code / harder logic. Cost scaling is roughly linear with thinking depth.

Inference accelerators (not models, but related)

Where the model runs matters

Groq (LPU, sub-second Llama 3.1 70B), Cerebras (CS-3, 1800+ tokens/sec on Llama 3.1 70B), SambaNova RDU, Fireworks and Together on NVIDIA. For latency-critical agentic workloads, the host can matter as much as the model choice.

Open-source heroes — the labs that ship weights

The open-weight ecosystem stopped being a curiosity and became the floor of the market sometime in 2024, and by mid-2026 it is the structural backbone of every non-Big-Three serious AI deployment. Six families dominate. Meta Llama is the gravitational center. Llama 3.1 (July 2024) was the moment frontier-tier weights became free; the 405B model effectively matched GPT-4 on common evaluations at launch. Llama 3.3 70B (December 2024) compressed most of 3.1 405B's quality into a 70B chassis. Llama 4 Scout and Maverick (April 2025) moved to mixture-of-experts and pushed advertised context windows to 10M tokens — though real-world retrieval at that length is a separate empirical question. The Llama Community License permits commercial use under a 700M-monthly-active-user cap that catches only the largest companies. Mistral AI (Paris) maintains the European frontier-open lane. Mistral Large 2 (123B, 128k context, July 2024) and Mixtral 8x22B (MoE, Apache 2.0) are the workhorses. Codestral, Pixtral, and Mistral Small 3 fill the niches. The license story is messier than Llama — many Mistral models use the Mistral Research License, which requires a paid agreement for commercial use. Read the file. DeepSeek (Hangzhou) reset the cost frontier in late 2024 and early 2025. DeepSeek-V3 (671B MoE, 37B active, December 2024) trained for roughly $5.6M of GPU time on the published figures, and DeepSeek-R1 (January 2025) released as an open-weight reasoning model that approached o1-class performance at a fraction of the inference cost. The R1-Distill family (Llama-70B, Qwen-32B base) is how most builders actually deploy reasoning today. Qwen (Alibaba) is the most prolific shipper. Qwen2.5 covers 0.5B through 72B, Coder-32B is at parity with closed code models on HumanEval+, Qwen2.5-VL handles vision, and the QwQ preview line ships open-weight reasoning. The Qwen3 series began rolling out in Q2 2025 under Apache 2.0, removing the last licensing friction. Microsoft Phi is the small-model leader. Phi-4 (14B, MIT, December 2024) outperforms many 70B models on reasoning benchmarks per parameter, and the Phi-3.5-MoE family extends that thesis. For on-device and edge deployments, Phi is the first model to evaluate. Google Gemma is the smallest-frontier-lab open family. Gemma 2 (June 2024) and Gemma 3 (March 2025) are derived from Gemini training infrastructure. The Gemma Terms of Use permit commercial use; the 27B variant runs comfortably on a single consumer GPU. The honorable mentions, briefly: Allen AI's OLMo 2 is the only fully-open frontier-class release (weights, data, training code, all under Apache 2.0). IBM Granite is the enterprise-positioned Apache 2.0 family. NVIDIA Nemotron is the largest open model by parameter count. And the long tail — 01.AI Yi, Stability LM, EleutherAI Pythia, MosaicML / Databricks DBRX — continues to ship useful work below the headline tier.

Release timeline — the last 24 months

A compressed chronology of the releases that moved the market. Dates are first-public-availability, not paper-publication.

  1. 2024-04

    Llama 3 8B / 70B; Mixtral 8x22B; Command R+

    Meta ships the first Llama 3 generation. Mistral releases Mixtral 8x22B under Apache 2.0. Cohere ships Command R+ as the first open-weight RAG-tuned 100B-class model.

  2. 2024-05

    GPT-4o

    OpenAI ships its first natively multimodal frontier model — text, vision, and audio in one weight set. Cuts price-per-token by roughly half vs. GPT-4 Turbo.

  3. 2024-06

    Claude 3.5 Sonnet (first cut); Gemma 2; Nemotron 340B

    Anthropic ships 3.5 Sonnet, which would dominate enterprise coding benchmarks for the next nine months. Google releases Gemma 2; NVIDIA releases Nemotron-4 340B as its largest open model.

  4. 2024-07

    Llama 3.1 405B; Mistral Large 2

    Meta's 405B becomes the first openly-released frontier-tier weight set. Mistral Large 2 ships at 123B with 128k context.

  5. 2024-09

    OpenAI o1-preview; Llama 3.2 (vision); Qwen2.5

    o1 introduces test-time compute scaling at the frontier. Llama 3.2 adds vision to the open family. Qwen2.5 ships across seven sizes.

  6. 2024-10

    Claude 3.5 Sonnet (new); Computer Use

    Anthropic ships an upgraded 3.5 Sonnet and the Computer Use beta — first frontier-lab native agentic control.

  7. 2024-11

    Claude 3.5 Haiku; Qwen2.5-Coder 32B; QwQ-32B

    Lower-tier frontier models compress capability further. QwQ becomes the first open-weight reasoning model.

  8. 2024-12

    OpenAI o1 GA, o3 announced; DeepSeek-V3; Phi-4; Llama 3.3 70B

    December 2024 was the densest release month of the cycle. DeepSeek-V3 in particular reset cost-per-token expectations.

  9. 2025-01

    DeepSeek-R1; o3-mini; Mistral Small 3

    DeepSeek-R1 ships as the first open-weight o1-class reasoning model under MIT for the weights. o3-mini follows weeks later.

  10. 2025-02

    Grok 3; Claude 3.7 Sonnet (with extended thinking)

    xAI ships Grok 3 with advertised 1M context. Anthropic adds extended thinking to the Claude line.

  11. 2025-03

    Gemini 2.5 Pro (preview); Gemma 3; OLMo 2 32B

    Google's 2.5 Pro reasserts the long-context lead. Gemma 3 and OLMo 2 32B fill out the open tier.

  12. 2025-04

    Llama 4 Scout / Maverick; OpenAI o3 GA, o4-mini, GPT-4.1; Gemini 2.5 Flash

    Meta moves to MoE at the frontier. OpenAI's o3 reaches general availability. GPT-4.1 reframes the API line.

  13. 2025-05

    Claude Opus 4 and Claude Sonnet 4

    Anthropic's Claude 4 generation ships, with Sonnet 4 offering a 1M-token context beta.

  14. 2025-06-onward

    Continued open-weight pressure; reasoning-on-everything; preview models that may or may not graduate

    The pattern from mid-2025 into 2026: every frontier model gets a reasoning variant, every open lab compresses costs, and the gap between closed and open narrows on benchmarks but widens on agentic capability. For anything released after this page's June 2026 refresh, check the provider's pricing page directly.

Cost discipline — what to actually pay attention to

Model selection in 2026 is rarely 'which model is smartest.' For 80% of production workloads it is 'which model is good enough at the lowest sustainable unit cost.' Apply the following filters in order before you commit to a SKU.

  • Output tokens cost 3-5x input tokens at every closed frontier lab. If your workload generates long completions, the output column is the one that matters — input pricing is misleading marketing.
  • Cached input is real. Anthropic, OpenAI, and Google all offer prompt-caching discounts (often 50-90% off cached portions). If your system prompt is long and static, build for caching from day one.
  • Batch API is 50% off at every lab that offers it (OpenAI, Anthropic, Google). For non-real-time workloads — overnight evals, document processing, content backfill — this is free money.
  • Mixture-of-experts pricing is per-active-parameter, not per-total-parameter. DeepSeek-V3 is 671B total but bills like a 37B model. This is why open-weight MoE has eaten the cost frontier.
  • Self-hosted breakeven on a frontier-open model lands somewhere around 50-200M monthly tokens depending on the model and the GPU rental rate. Below that, hosted APIs win on TCO; above it, your own infrastructure starts to pay back.
  • Reasoning models are not always worth it. o1 and R1 charge for the hidden chain-of-thought tokens too. For tasks that do not need multi-step reasoning, a cheaper non-reasoning model gets you the same answer for a tenth of the cost.
  • Context window pricing is non-linear. Gemini 2.5 Pro and Claude Sonnet 4 both have higher per-token rates above 200k. If you can summarize-and-cache instead of dumping the full corpus, you usually should.

What this page does not cover

Three deliberate omissions, named in the open. First — we do not list every regional or domain-specific model. Yi, ERNIE, Hunyuan, Baichuan, Sber GigaChat, Mistral's per-country variants, and the long tail of fine-tunes on Hugging Face are real, but cataloguing all 50,000+ public model checkpoints is not useful. We focus on the families with active maintenance, English-first documentation, and a clear license. Second — we do not benchmark. This page is a price-and-availability index. For capability comparisons, look at LMSYS Chatbot Arena, the Hugging Face Open LLM Leaderboard, SWE-bench Verified, AIME, GPQA Diamond, and Aider Polyglot — and prefer multiple benchmarks to any single number. Third — we do not predict. If a lab has not shipped a SKU under a name, that name is not in our tables. We will re-version the page when reality moves.

How to keep this current

This page is maintained as a snapshot, not a live feed. The honest way to use it is: scan the structure to understand the landscape, then click through to each provider's own pricing page before you quote a number in a contract, a proposal, or a research paper. The canonical pricing pages, all linked in the citations below, are: anthropic.com/pricing, openai.com/api/pricing, ai.google.dev/pricing, x.ai/api, mistral.ai/pricing, deepseek.com/api-pricing, cohere.com/pricing, ai21.com/pricing, together.ai/pricing, fireworks.ai/pricing, groq.com/pricing, and the Hugging Face model pages for each open-weight release. For open-weight licenses specifically, always download the LICENSE file from the original release repo — summaries on third-party sites have been wrong often enough to matter. The community trackers worth bookmarking alongside this page: Artificial Analysis (artificialanalysis.ai) for ongoing price-and-latency benchmarks, the LMSYS Chatbot Arena Leaderboard for capability rankings, and Epoch AI's model database for training-compute and parameter-count history. None of those are a substitute for the provider's own page, but they catch errors and stale rows faster than any single static document can.

Sources

  1. [01]

    Anthropic publishes per-model token pricing for Claude 3.5 Sonnet, 3.5 Haiku, Opus 4, and Sonnet 4, including the 1M-token context beta tier.

    anthropic.com/pricing

  2. [02]

    Claude 3.5 Sonnet was released June 20, 2024, with the upgraded snapshot released October 22, 2024.

    anthropic.com/news/claude-3-5-sonnet

  3. [03]

    Anthropic announced Claude Opus 4 and Claude Sonnet 4 in May 2025 with extended thinking and 1M context beta for Sonnet 4.

    anthropic.com/news/claude-4

  4. [04]

    OpenAI publishes current per-model API pricing for GPT-4o, GPT-4o-mini, o1, o1-mini, o3, o3-mini, and o4-mini.

    openai.com/api/pricing

  5. [05]

    GPT-4o was announced May 13, 2024 as OpenAI's first natively multimodal frontier model.

    openai.com/index/hello-gpt-4o

  6. [06]

    OpenAI o1 preview was released September 12, 2024 with full o1 reaching GA in December 2024.

    openai.com/index/learning-to-reason-with-llms

  7. [07]

    OpenAI o3 and o4-mini were released April 16, 2025.

    openai.com/index/introducing-o3-and-o4-mini

  8. [08]

    Google AI Studio and Vertex AI publish pricing for Gemini 1.5 Pro, Gemini 2.5 Pro, and Gemini 2.5 Flash with tiered pricing above 200k tokens.

    ai.google.dev/pricing

  9. [09]

    Gemini 2.5 Pro was announced in March 2025 as a thinking-by-default frontier model.

    blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025

  10. [10]

    xAI publishes Grok 2 and Grok 3 API pricing with stated context windows.

    x.ai/api

  11. [11]

    Llama 3.1 was released July 23, 2024 in 8B, 70B, and 405B sizes with 128k context.

    ai.meta.com/blog/meta-llama-3-1

  12. [12]

    Meta announced Llama 4 Scout and Maverick on April 5, 2025 with mixture-of-experts architecture.

    ai.meta.com/blog/llama-4-multimodal-intelligence

  13. [13]

    The Llama 3.1 Community License governs commercial use with a 700M monthly active user threshold.

    llama.meta.com/llama3_1/license

  14. [14]

    Mistral Large 2 was released July 24, 2024 at 123B parameters with 128k context.

    mistral.ai/news/mistral-large-2407

  15. [15]

    Codestral 25.01 was released January 2025 at 22B parameters with 256k context.

    mistral.ai/news/codestral-2501

  16. [16]

    DeepSeek-V3 was released December 26, 2024 as a 671B-total / 37B-active mixture-of-experts model.

    api-docs.deepseek.com/news/news1226

  17. [17]

    DeepSeek-R1 was released January 20, 2025 under the MIT license for the model weights.

    api-docs.deepseek.com/news/news250120

  18. [18]

    The DeepSeek-V3 technical report documents the architecture, training, and reported $5.576M compute cost.

    arxiv.org/abs/2412.19437

  19. [19]

    The DeepSeek-R1 technical report documents the RL-based reasoning training pipeline and benchmark results.

    arxiv.org/abs/2501.12948

  20. [20]

    Qwen2.5 was released September 18, 2024 across seven sizes from 0.5B to 72B parameters.

    qwenlm.github.io/blog/qwen2.5

  21. [21]

    QwQ-32B-Preview was released November 27, 2024 as an open-weight reasoning model under Apache 2.0.

    qwenlm.github.io/blog/qwq-32b-preview

  22. [22]

    Phi-4 was released December 12, 2024 at 14B parameters under the MIT license.

    microsoft.com/en-us/research/blog/phi-4-technical-report

  23. [23]

    Gemma 3 was released March 12, 2025 in 1B, 4B, 12B, and 27B variants.

    blog.google/technology/developers/gemma-3

  24. [24]

    OLMo 2 32B was released March 2025 as a fully-open release including weights, data, and training code under Apache 2.0.

    allenai.org/blog/olmo2-32B

  25. [25]

    LMSYS Chatbot Arena maintains a community-voted leaderboard ranking deployed models across labs.

    lmsys.org/blog/chatbot-arena

LAB · ATOMEONS · MARCO ISLAND FLÆONS RESEARCH · 12 PAPERS · CC-BY 4.0ORANGEBOX v1.0.0-beta · TURBO-OPTIMIZE CLAUDE · SHIPPED 2026-05-30B00KMAKR v3.2.0 · AI PUBLISHING COCKPIT · MAC + WINDOWSFREE LAUNCH WEEK · ENDS JUNE 6 · §4A NO-SAAS LOCKFOUNDER'S VIEW · NEXT BROADCAST IN ...CITE THE WORK · FORWARD THE LINK · NO ALGORITHMLAB · ATOMEONS · MARCO ISLAND FLÆONS RESEARCH · 12 PAPERS · CC-BY 4.0ORANGEBOX v1.0.0-beta · TURBO-OPTIMIZE CLAUDE · SHIPPED 2026-05-30B00KMAKR v3.2.0 · AI PUBLISHING COCKPIT · MAC + WINDOWSFREE LAUNCH WEEK · ENDS JUNE 6 · §4A NO-SAAS LOCKFOUNDER'S VIEW · NEXT BROADCAST IN ...CITE THE WORK · FORWARD THE LINK · NO ALGORITHM