The shipping AI model tracker
Every frontier and open-weight model in production, as of June 2026 — honest about uncertainty.
How to read this tracker
A few conventions that apply to every table below. Skim these once, then the rows make sense at a glance.
- Params — Listed only where the lab has officially disclosed the figure. For Anthropic, OpenAI, and frontier Gemini, this column reads 'not disclosed' on purpose. We do not repeat third-party guesses.
- Context window — Maximum total tokens (input plus output) the API will accept for the model's standard tier. Some providers sell longer-context variants at higher prices; we note those inline.
- Modalities — Text in / text out is the baseline. We mark vision, audio, image generation, and tool use where the model supports them natively (not via a wrapper).
- Price per 1M tokens — Standard public API pricing in USD as of the page's June 2026 refresh. Input and output are listed separately because the gap matters for any agentic workload. Batch-API and cached-input discounts are footnoted, not in the main number.
- Release date — First public general-availability date from the provider's own announcement. Preview/beta dates are noted as such.
- Open weights — Yes means downloadable from Hugging Face (or the lab's own host) under a license that permits local inference. 'Yes, gated' means the weights ship but the license restricts commercial use or requires acceptance.
- License — The exact license name. The fine print on Llama, Gemma, and DeepSeek is non-trivial; we link to the actual license file, not a summary.
Frontier closed-weight models
These are the labs that do not ship weights. Pricing is API-only. Parameter counts are not disclosed by any of these providers and we do not estimate them. All dates and prices were verified against the provider's own documentation at the page's June 2026 refresh — re-check before quoting in a contract.
On 'GPT-5', 'Claude 4.5', 'Gemini 3', and other models that may or may not exist by the time you read this
As of this page's June 2026 refresh, none of the three frontier labs had publicly released a model under the names 'GPT-5,' 'Claude 4.5,' or 'Gemini 3' on their official pricing pages. There were leaks, demos, and 'preview' SKUs in various states of availability, and the labs change their naming conventions frequently — OpenAI in particular has signalled a possible unification of the GPT and o-series lines. If you have arrived at this page after such a release, this table is stale, and the right move is to open the provider's pricing page directly. We will not invent SKUs to look current. The pattern across 2024-2026: every frontier launch we have seen ships first as a 'preview' with API access for paying customers, then graduates to GA within 4-12 weeks, then gets a cheaper sibling within a quarter. Apply that template to whatever you find on the live pricing page.
Frontier open-weight models
These are the models you can actually download and run on your own hardware (or rent from a third-party host like Together, Fireworks, Groq, or Cerebras). Pricing is hosted-API pricing from the model's primary commercial host, where available — but the point of this tier is that you do not have to use any host at all. All have permissive-enough licenses for most builder use cases; read the fine print on the Llama and DeepSeek licenses before shipping in production.
Specialized and embedded models
These are models that are not trying to be a generalist chat assistant. They are tuned for one job — code completion, embeddings, image generation, speech, or on-device inference — and the right shopping question is 'best in class for this specific narrow task,' not 'best overall.'
Code-completion (IDE-grade)
Open + closed
Mistral Codestral 25.01 (22B, 256k context) is the open-weight reference; Qwen2.5-Coder 32B (Apache 2.0) is the strongest fully-permissive option. Closed-tier: Anthropic Claude Sonnet 4 and OpenAI GPT-4o lead the SWE-bench Verified leaderboard as of early 2026. DeepSeek-Coder-V2 (236B MoE, 21B active) remains competitive at a fraction of the cost.
Embeddings
Retrieval & semantic search
OpenAI text-embedding-3-large ($0.13/1M tokens, 3072 dims), Cohere Embed v3 ($0.10/1M, multilingual), Voyage-3 ($0.06/1M, retrieval-tuned), and the open BGE-M3 + Nomic Embed family. Pick by language coverage, dimension count, and whether you need binary/int8 quantization for vector-DB cost.
Image generation
Diffusion + autoregressive
Black Forest Labs FLUX.1 (dev/schnell open weights, pro API), Stability SD 3.5 Large (open), OpenAI DALL-E 3 (closed), Google Imagen 3 (closed), Midjourney v6.1 (closed, no API). FLUX.1 [dev] under FLUX.1 [dev] Non-Commercial License is the de facto open default.
Speech-to-text
ASR
OpenAI Whisper Large v3 remains the open-weight baseline (MIT, 1550M params). Deepgram Nova-2, AssemblyAI Universal-2, and ElevenLabs Scribe are the closed alternatives. NVIDIA Parakeet-TDT 1.1B is the leading open-weight ASR on the Open ASR Leaderboard.
Text-to-speech
TTS
ElevenLabs (closed, voice-cloning leader), OpenAI tts-1-hd ($30/1M chars), Google Chirp/Cloud TTS. Open: Coqui XTTS-v2 (CPML license), Bark by Suno (MIT, lower fidelity). For real-time / low-latency, OpenAI's Realtime API and Cartesia Sonic dominate.
On-device (laptop, phone)
≤8B params, ≤4GB quantized
Llama 3.2 1B/3B (mobile-targeted, 128k context), Phi-4 14B (MIT, runs quantized on 16GB Mac), Gemma 3 4B (vision-capable), Qwen2.5 3B / 7B, Mistral Small 3 24B (workstation tier). Inference via llama.cpp, MLX (Apple), or Ollama.
Reasoning (test-time compute)
Long chain-of-thought
Closed: OpenAI o1 / o3 / o4-mini, Anthropic Claude Sonnet 4 extended thinking, Google Gemini 2.5 Pro Deep Think (preview), DeepSeek-R1 (open). Pattern: pay more output tokens, get harder math / harder code / harder logic. Cost scaling is roughly linear with thinking depth.
Inference accelerators (not models, but related)
Where the model runs matters
Groq (LPU, sub-second Llama 3.1 70B), Cerebras (CS-3, 1800+ tokens/sec on Llama 3.1 70B), SambaNova RDU, Fireworks and Together on NVIDIA. For latency-critical agentic workloads, the host can matter as much as the model choice.
Open-source heroes — the labs that ship weights
Release timeline — the last 24 months
A compressed chronology of the releases that moved the market. Dates are first-public-availability, not paper-publication.
2024-04
Llama 3 8B / 70B; Mixtral 8x22B; Command R+
Meta ships the first Llama 3 generation. Mistral releases Mixtral 8x22B under Apache 2.0. Cohere ships Command R+ as the first open-weight RAG-tuned 100B-class model.
2024-05
GPT-4o
OpenAI ships its first natively multimodal frontier model — text, vision, and audio in one weight set. Cuts price-per-token by roughly half vs. GPT-4 Turbo.
2024-06
Claude 3.5 Sonnet (first cut); Gemma 2; Nemotron 340B
Anthropic ships 3.5 Sonnet, which would dominate enterprise coding benchmarks for the next nine months. Google releases Gemma 2; NVIDIA releases Nemotron-4 340B as its largest open model.
2024-07
Llama 3.1 405B; Mistral Large 2
Meta's 405B becomes the first openly-released frontier-tier weight set. Mistral Large 2 ships at 123B with 128k context.
2024-09
OpenAI o1-preview; Llama 3.2 (vision); Qwen2.5
o1 introduces test-time compute scaling at the frontier. Llama 3.2 adds vision to the open family. Qwen2.5 ships across seven sizes.
2024-10
Claude 3.5 Sonnet (new); Computer Use
Anthropic ships an upgraded 3.5 Sonnet and the Computer Use beta — first frontier-lab native agentic control.
2024-11
Claude 3.5 Haiku; Qwen2.5-Coder 32B; QwQ-32B
Lower-tier frontier models compress capability further. QwQ becomes the first open-weight reasoning model.
2024-12
OpenAI o1 GA, o3 announced; DeepSeek-V3; Phi-4; Llama 3.3 70B
December 2024 was the densest release month of the cycle. DeepSeek-V3 in particular reset cost-per-token expectations.
2025-01
DeepSeek-R1; o3-mini; Mistral Small 3
DeepSeek-R1 ships as the first open-weight o1-class reasoning model under MIT for the weights. o3-mini follows weeks later.
2025-02
Grok 3; Claude 3.7 Sonnet (with extended thinking)
xAI ships Grok 3 with advertised 1M context. Anthropic adds extended thinking to the Claude line.
2025-03
Gemini 2.5 Pro (preview); Gemma 3; OLMo 2 32B
Google's 2.5 Pro reasserts the long-context lead. Gemma 3 and OLMo 2 32B fill out the open tier.
2025-04
Llama 4 Scout / Maverick; OpenAI o3 GA, o4-mini, GPT-4.1; Gemini 2.5 Flash
Meta moves to MoE at the frontier. OpenAI's o3 reaches general availability. GPT-4.1 reframes the API line.
2025-05
Claude Opus 4 and Claude Sonnet 4
Anthropic's Claude 4 generation ships, with Sonnet 4 offering a 1M-token context beta.
2025-06-onward
Continued open-weight pressure; reasoning-on-everything; preview models that may or may not graduate
The pattern from mid-2025 into 2026: every frontier model gets a reasoning variant, every open lab compresses costs, and the gap between closed and open narrows on benchmarks but widens on agentic capability. For anything released after this page's June 2026 refresh, check the provider's pricing page directly.
Cost discipline — what to actually pay attention to
Model selection in 2026 is rarely 'which model is smartest.' For 80% of production workloads it is 'which model is good enough at the lowest sustainable unit cost.' Apply the following filters in order before you commit to a SKU.
- Output tokens cost 3-5x input tokens at every closed frontier lab. If your workload generates long completions, the output column is the one that matters — input pricing is misleading marketing.
- Cached input is real. Anthropic, OpenAI, and Google all offer prompt-caching discounts (often 50-90% off cached portions). If your system prompt is long and static, build for caching from day one.
- Batch API is 50% off at every lab that offers it (OpenAI, Anthropic, Google). For non-real-time workloads — overnight evals, document processing, content backfill — this is free money.
- Mixture-of-experts pricing is per-active-parameter, not per-total-parameter. DeepSeek-V3 is 671B total but bills like a 37B model. This is why open-weight MoE has eaten the cost frontier.
- Self-hosted breakeven on a frontier-open model lands somewhere around 50-200M monthly tokens depending on the model and the GPU rental rate. Below that, hosted APIs win on TCO; above it, your own infrastructure starts to pay back.
- Reasoning models are not always worth it. o1 and R1 charge for the hidden chain-of-thought tokens too. For tasks that do not need multi-step reasoning, a cheaper non-reasoning model gets you the same answer for a tenth of the cost.
- Context window pricing is non-linear. Gemini 2.5 Pro and Claude Sonnet 4 both have higher per-token rates above 200k. If you can summarize-and-cache instead of dumping the full corpus, you usually should.
What this page does not cover
Three deliberate omissions, named in the open. First — we do not list every regional or domain-specific model. Yi, ERNIE, Hunyuan, Baichuan, Sber GigaChat, Mistral's per-country variants, and the long tail of fine-tunes on Hugging Face are real, but cataloguing all 50,000+ public model checkpoints is not useful. We focus on the families with active maintenance, English-first documentation, and a clear license. Second — we do not benchmark. This page is a price-and-availability index. For capability comparisons, look at LMSYS Chatbot Arena, the Hugging Face Open LLM Leaderboard, SWE-bench Verified, AIME, GPQA Diamond, and Aider Polyglot — and prefer multiple benchmarks to any single number. Third — we do not predict. If a lab has not shipped a SKU under a name, that name is not in our tables. We will re-version the page when reality moves.
How to keep this current
Sources
- [01]
Anthropic publishes per-model token pricing for Claude 3.5 Sonnet, 3.5 Haiku, Opus 4, and Sonnet 4, including the 1M-token context beta tier.
anthropic.com/pricing
- [02]
Claude 3.5 Sonnet was released June 20, 2024, with the upgraded snapshot released October 22, 2024.
anthropic.com/news/claude-3-5-sonnet
- [03]
Anthropic announced Claude Opus 4 and Claude Sonnet 4 in May 2025 with extended thinking and 1M context beta for Sonnet 4.
anthropic.com/news/claude-4
- [04]
OpenAI publishes current per-model API pricing for GPT-4o, GPT-4o-mini, o1, o1-mini, o3, o3-mini, and o4-mini.
openai.com/api/pricing
- [05]
GPT-4o was announced May 13, 2024 as OpenAI's first natively multimodal frontier model.
openai.com/index/hello-gpt-4o
- [06]
OpenAI o1 preview was released September 12, 2024 with full o1 reaching GA in December 2024.
openai.com/index/learning-to-reason-with-llms
- [07]
OpenAI o3 and o4-mini were released April 16, 2025.
openai.com/index/introducing-o3-and-o4-mini
- [08]
Google AI Studio and Vertex AI publish pricing for Gemini 1.5 Pro, Gemini 2.5 Pro, and Gemini 2.5 Flash with tiered pricing above 200k tokens.
ai.google.dev/pricing
- [09]
Gemini 2.5 Pro was announced in March 2025 as a thinking-by-default frontier model.
blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025
- [10]
xAI publishes Grok 2 and Grok 3 API pricing with stated context windows.
x.ai/api
- [11]
Llama 3.1 was released July 23, 2024 in 8B, 70B, and 405B sizes with 128k context.
ai.meta.com/blog/meta-llama-3-1
- [12]
Meta announced Llama 4 Scout and Maverick on April 5, 2025 with mixture-of-experts architecture.
ai.meta.com/blog/llama-4-multimodal-intelligence
- [13]
The Llama 3.1 Community License governs commercial use with a 700M monthly active user threshold.
llama.meta.com/llama3_1/license
- [14]
Mistral Large 2 was released July 24, 2024 at 123B parameters with 128k context.
mistral.ai/news/mistral-large-2407
- [15]
Codestral 25.01 was released January 2025 at 22B parameters with 256k context.
mistral.ai/news/codestral-2501
- [16]
DeepSeek-V3 was released December 26, 2024 as a 671B-total / 37B-active mixture-of-experts model.
api-docs.deepseek.com/news/news1226
- [17]
DeepSeek-R1 was released January 20, 2025 under the MIT license for the model weights.
api-docs.deepseek.com/news/news250120
- [18]
The DeepSeek-V3 technical report documents the architecture, training, and reported $5.576M compute cost.
arxiv.org/abs/2412.19437
- [19]
The DeepSeek-R1 technical report documents the RL-based reasoning training pipeline and benchmark results.
arxiv.org/abs/2501.12948
- [20]
Qwen2.5 was released September 18, 2024 across seven sizes from 0.5B to 72B parameters.
qwenlm.github.io/blog/qwen2.5
- [21]
QwQ-32B-Preview was released November 27, 2024 as an open-weight reasoning model under Apache 2.0.
qwenlm.github.io/blog/qwq-32b-preview
- [22]
Phi-4 was released December 12, 2024 at 14B parameters under the MIT license.
microsoft.com/en-us/research/blog/phi-4-technical-report
- [23]
Gemma 3 was released March 12, 2025 in 1B, 4B, 12B, and 27B variants.
blog.google/technology/developers/gemma-3
- [24]
OLMo 2 32B was released March 2025 as a fully-open release including weights, data, and training code under Apache 2.0.
allenai.org/blog/olmo2-32B
- [25]
LMSYS Chatbot Arena maintains a community-voted leaderboard ranking deployed models across labs.
lmsys.org/blog/chatbot-arena