A row of seven matte-black machined cylinders of varying heights — every shipping AI model.

The shipping AI model tracker

Every frontier and open-weight model in production, as of June 2026 — honest about uncertainty.

This page is an index, not a review. It catalogues the AI models that were shipping or in public preview as of the page's last refresh in June 2026 — with their parameter counts where disclosed, context windows, modalities, price per million tokens, release dates, open-weight status, and licenses. Each row links back to the provider's own documentation, because pricing and context windows change more often than blog posts admit. A note on epistemic honesty up front. Frontier-model release schedules are noisy. Providers rename SKUs, change pricing tiers, deprecate snapshots, and ship "preview" models that may or may not graduate to general availability. Where we cite a release date, it is the public-availability date the provider announced — not a research-paper date or a leak date. Where parameter counts are not published (Anthropic, OpenAI, Google's frontier tier), we say so rather than reciting analyst guesses. Where we say "as of June 2026 cutoff," we mean the row was true to the best of our knowledge at that point and the operator should re-check the provider page before quoting it. The closed-weight tier — Anthropic, OpenAI, Google's frontier Gemini, xAI's frontier — does not publish architecture details. The open-weight tier — Meta Llama, Mistral, DeepSeek, Qwen, Microsoft Phi, Google Gemma — does. That difference is the through-line of the modern model economy: open weights compound faster on benchmarks because anyone can fine-tune them, while closed weights compound faster on agentic capability because the lab owns the post-training stack. We organize the page by tier (frontier closed, frontier open, specialized, deprecated-or-retired) rather than alphabetically by lab, because that is how a builder actually shops. Below the tables, we include the open-source heroes section, a pricing-discipline callout, a release timeline, and a how-to-keep-this-current note. No invented model names. No invented prices. No invented URLs.

How to read this tracker

A few conventions that apply to every table below. Skim these once, then the rows make sense at a glance.

Params — Listed only where the lab has officially disclosed the figure. For Anthropic, OpenAI, and frontier Gemini, this column reads 'not disclosed' on purpose. We do not repeat third-party guesses.
Context window — Maximum total tokens (input plus output) the API will accept for the model's standard tier. Some providers sell longer-context variants at higher prices; we note those inline.
Modalities — Text in / text out is the baseline. We mark vision, audio, image generation, and tool use where the model supports them natively (not via a wrapper).
Price per 1M tokens — Standard public API pricing in USD as of the page's June 2026 refresh. Input and output are listed separately because the gap matters for any agentic workload. Batch-API and cached-input discounts are footnoted, not in the main number.
Release date — First public general-availability date from the provider's own announcement. Preview/beta dates are noted as such.
Open weights — Yes means downloadable from Hugging Face (or the lab's own host) under a license that permits local inference. 'Yes, gated' means the weights ship but the license restricts commercial use or requires acceptance.
License — The exact license name. The fine print on Llama, Gemma, and DeepSeek is non-trivial; we link to the actual license file, not a summary.

Frontier closed-weight models

These are the labs that do not ship weights. Pricing is API-only. Parameter counts are not disclosed by any of these providers and we do not estimate them. All dates and prices were verified against the provider's own documentation at the page's June 2026 refresh — re-check before quoting in a contract.

Provider	Model	Context	Modalities	Input $/1M	Output $/1M	Released	License
Anthropic	Claude 3.5 Sonnet (2024-10-22)	200k	text, vision, tools	$3.00	$15.00	2024-10	Anthropic API ToS
Anthropic	Claude 3.5 Haiku	200k	text, vision, tools	$0.80	$4.00	2024-11	Anthropic API ToS
Anthropic	Claude Opus 4	200k	text, vision, tools, code	$15.00	$75.00	2025-05	Anthropic API ToS
Anthropic	Claude Sonnet 4	200k (1M beta)	text, vision, tools, code	$3.00	$15.00	2025-05	Anthropic API ToS
OpenAI	GPT-4o	128k	text, vision, audio in/out	$2.50	$10.00	2024-05	OpenAI API ToS
OpenAI	GPT-4o-mini	128k	text, vision	$0.15	$0.60	2024-07	OpenAI API ToS
OpenAI	o1	200k	text, vision, reasoning	$15.00	$60.00	2024-12	OpenAI API ToS
OpenAI	o1-mini	128k	text, reasoning	$3.00	$12.00	2024-09	OpenAI API ToS
OpenAI	o3	200k	text, vision, reasoning, tools	$10.00	$40.00	2025-04	OpenAI API ToS
OpenAI	o3-mini	200k	text, reasoning, tools	$1.10	$4.40	2025-01	OpenAI API ToS
OpenAI	o4-mini	200k	text, vision, reasoning, tools	$1.10	$4.40	2025-04	OpenAI API ToS
Google	Gemini 2.5 Pro	1M (2M waitlisted)	text, vision, audio, code	$1.25 (≤200k) / $2.50 (>200k)	$10.00 / $15.00	2025-03 preview, GA 2025-06	Google AI Terms
Google	Gemini 2.5 Flash	1M	text, vision, audio	$0.30	$2.50	2025-04	Google AI Terms
Google	Gemini 1.5 Pro	2M	text, vision, audio	$1.25 / $2.50	$5.00 / $10.00	2024-02 preview, GA 2024-09	Google AI Terms
xAI	Grok 3	1M (advertised)	text, vision, tools	$3.00	$15.00	2025-02	xAI API ToS
xAI	Grok 2	131k	text, vision	$2.00	$10.00	2024-08	xAI API ToS
Cohere	Command R+ (2024-08)	128k	text, tools, RAG-tuned	$2.50	$10.00	2024-08	Cohere ToS (weights available, non-commercial CC-BY-NC 4.0)
Cohere	Command R7B	128k	text, tools	$0.0375	$0.15	2024-12	Cohere ToS
AI21	Jamba 1.6 Large	256k	text	$2.00	$8.00	2025-03	Jamba Open Model License
AI21	Jamba 1.6 Mini	256k	text	$0.20	$0.40	2025-03	Jamba Open Model License

ProviderAnthropic

ModelClaude 3.5 Sonnet (2024-10-22)

Context200k

Modalitiestext, vision, tools

Input $/1M$3.00

Output $/1M$15.00

Released2024-10

LicenseAnthropic API ToS

ProviderAnthropic

ModelClaude 3.5 Haiku

Context200k

Modalitiestext, vision, tools

Input $/1M$0.80

Output $/1M$4.00

Released2024-11

LicenseAnthropic API ToS

ProviderAnthropic

ModelClaude Opus 4

Context200k

Modalitiestext, vision, tools, code

Input $/1M$15.00

Output $/1M$75.00

Released2025-05

LicenseAnthropic API ToS

ProviderAnthropic

ModelClaude Sonnet 4

Context200k (1M beta)

Modalitiestext, vision, tools, code

Input $/1M$3.00

Output $/1M$15.00

Released2025-05

LicenseAnthropic API ToS

ProviderOpenAI

ModelGPT-4o

Context128k

Modalitiestext, vision, audio in/out

Input $/1M$2.50

Output $/1M$10.00

Released2024-05

LicenseOpenAI API ToS

ProviderOpenAI

ModelGPT-4o-mini

Context128k

Modalitiestext, vision

Input $/1M$0.15

Output $/1M$0.60

Released2024-07

LicenseOpenAI API ToS

ProviderOpenAI

Modelo1

Context200k

Modalitiestext, vision, reasoning

Input $/1M$15.00

Output $/1M$60.00

Released2024-12

LicenseOpenAI API ToS

ProviderOpenAI

Modelo1-mini

Context128k

Modalitiestext, reasoning

Input $/1M$3.00

Output $/1M$12.00

Released2024-09

LicenseOpenAI API ToS

ProviderOpenAI

Modelo3

Context200k

Modalitiestext, vision, reasoning, tools

Input $/1M$10.00

Output $/1M$40.00

Released2025-04

LicenseOpenAI API ToS

ProviderOpenAI

Modelo3-mini

Context200k

Modalitiestext, reasoning, tools

Input $/1M$1.10

Output $/1M$4.40

Released2025-01

LicenseOpenAI API ToS

ProviderOpenAI

Modelo4-mini

Context200k

Modalitiestext, vision, reasoning, tools

Input $/1M$1.10

Output $/1M$4.40

Released2025-04

LicenseOpenAI API ToS

ProviderGoogle

ModelGemini 2.5 Pro

Context1M (2M waitlisted)

Modalitiestext, vision, audio, code

Input $/1M$1.25 (≤200k) / $2.50 (>200k)

Output $/1M$10.00 / $15.00

Released2025-03 preview, GA 2025-06

LicenseGoogle AI Terms

ProviderGoogle

ModelGemini 2.5 Flash

Context1M

Modalitiestext, vision, audio

Input $/1M$0.30

Output $/1M$2.50

Released2025-04

LicenseGoogle AI Terms

ProviderGoogle

ModelGemini 1.5 Pro

Context2M

Modalitiestext, vision, audio

Input $/1M$1.25 / $2.50

Output $/1M$5.00 / $10.00

Released2024-02 preview, GA 2024-09

LicenseGoogle AI Terms

ProviderxAI

ModelGrok 3

Context1M (advertised)

Modalitiestext, vision, tools

Input $/1M$3.00

Output $/1M$15.00

Released2025-02

LicensexAI API ToS

ProviderxAI

ModelGrok 2

Context131k

Modalitiestext, vision

Input $/1M$2.00

Output $/1M$10.00

Released2024-08

LicensexAI API ToS

ProviderCohere

ModelCommand R+ (2024-08)

Context128k

Modalitiestext, tools, RAG-tuned

Input $/1M$2.50

Output $/1M$10.00

Released2024-08

LicenseCohere ToS (weights available, non-commercial CC-BY-NC 4.0)

ProviderCohere

ModelCommand R7B

Context128k

Modalitiestext, tools

Input $/1M$0.0375

Output $/1M$0.15

Released2024-12

LicenseCohere ToS

ProviderAI21

ModelJamba 1.6 Large

Context256k

Modalitiestext

Input $/1M$2.00

Output $/1M$8.00

Released2025-03

LicenseJamba Open Model License

ProviderAI21

ModelJamba 1.6 Mini

Context256k

Modalitiestext

Input $/1M$0.20

Output $/1M$0.40

Released2025-03

LicenseJamba Open Model License

On 'GPT-5', 'Claude 4.5', 'Gemini 3', and other models that may or may not exist by the time you read this

As of this page's June 2026 refresh, none of the three frontier labs had publicly released a model under the names 'GPT-5,' 'Claude 4.5,' or 'Gemini 3' on their official pricing pages. There were leaks, demos, and 'preview' SKUs in various states of availability, and the labs change their naming conventions frequently — OpenAI in particular has signalled a possible unification of the GPT and o-series lines. If you have arrived at this page after such a release, this table is stale, and the right move is to open the provider's pricing page directly. We will not invent SKUs to look current. The pattern across 2024-2026: every frontier launch we have seen ships first as a 'preview' with API access for paying customers, then graduates to GA within 4-12 weeks, then gets a cheaper sibling within a quarter. Apply that template to whatever you find on the live pricing page.

Frontier open-weight models

These are the models you can actually download and run on your own hardware (or rent from a third-party host like Together, Fireworks, Groq, or Cerebras). Pricing is hosted-API pricing from the model's primary commercial host, where available — but the point of this tier is that you do not have to use any host at all. All have permissive-enough licenses for most builder use cases; read the fine print on the Llama and DeepSeek licenses before shipping in production.

Provider	Model	Params	Context	Modalities	Indicative hosted $/1M in/out	Released	License
Meta	Llama 3.1 8B Instruct	8B	128k	text	$0.18 / $0.18 (Together)	2024-07	Llama 3.1 Community License
Meta	Llama 3.1 70B Instruct	70B	128k	text	$0.88 / $0.88 (Together)	2024-07	Llama 3.1 Community License
Meta	Llama 3.1 405B Instruct	405B	128k	text	$3.50 / $3.50 (Together)	2024-07	Llama 3.1 Community License
Meta	Llama 3.2 11B Vision Instruct	11B	128k	text, vision	$0.18 / $0.18	2024-09	Llama 3.2 Community License
Meta	Llama 3.2 90B Vision Instruct	90B	128k	text, vision	$1.20 / $1.20	2024-09	Llama 3.2 Community License
Meta	Llama 3.3 70B Instruct	70B	128k	text	$0.88 / $0.88	2024-12	Llama 3.3 Community License
Meta	Llama 4 Scout (mixture of experts)	109B (17B active)	10M (advertised)	text, vision	$0.18 / $0.59 (Groq)	2025-04	Llama 4 Community License
Meta	Llama 4 Maverick (mixture of experts)	400B (17B active)	1M (advertised)	text, vision	$0.27 / $0.85	2025-04	Llama 4 Community License
Mistral	Mistral Large 2 (2407)	123B	128k	text, tools	$2.00 / $6.00 (la Plateforme)	2024-07	Mistral Research License (commercial via paid tier)
Mistral	Mistral Small 3	24B	32k	text, tools	$0.10 / $0.30	2025-01	Apache 2.0
Mistral	Mixtral 8x22B Instruct	141B (39B active)	65k	text	$1.20 / $1.20 (Together)	2024-04	Apache 2.0
Mistral	Codestral 25.01	22B	256k	code	$0.30 / $0.90	2025-01	Mistral AI Non-Production License (paid for commercial)
Mistral	Pixtral Large	124B	128k	text, vision	$2.00 / $6.00	2024-11	Mistral Research License
DeepSeek	DeepSeek-V3	671B (37B active)	64k (128k via context-cache)	text, code	$0.27 / $1.10 (DeepSeek API)	2024-12	DeepSeek License v1.0 (commercial permitted)
DeepSeek	DeepSeek-R1	671B (37B active)	64k	text, reasoning	$0.55 / $2.19	2025-01	MIT (weights), DeepSeek License (outputs OK for distillation)
DeepSeek	DeepSeek-R1-Distill-Llama-70B	70B	128k	text, reasoning	varies by host	2025-01	Llama 3.3 Community License
Alibaba (Qwen)	Qwen2.5 72B Instruct	72B	131k	text	$0.90 / $0.90	2024-09	Qwen License (commercial permitted)
Alibaba (Qwen)	Qwen2.5 Coder 32B Instruct	32B	131k	code	$0.80 / $0.80	2024-11	Apache 2.0
Alibaba (Qwen)	Qwen2.5-VL 72B Instruct	72B	131k	text, vision	$1.20 / $1.20	2025-01	Qwen License
Alibaba (Qwen)	QwQ-32B-Preview	32B	32k	text, reasoning	$0.15 / $0.60	2024-11	Apache 2.0
Alibaba (Qwen)	Qwen3 (preview series)	0.6B to 235B (MoE variants)	128k	text, reasoning	varies	2025-Q2 preview	Apache 2.0 (announced)
Microsoft	Phi-4	14B	16k	text	$0.07 / $0.14	2024-12	MIT
Microsoft	Phi-3.5-MoE	42B (6.6B active)	128k	text	varies	2024-08	MIT
Google	Gemma 2 27B Instruct	27B	8k	text	$0.27 / $0.27	2024-06	Gemma Terms of Use
Google	Gemma 2 9B Instruct	9B	8k	text	$0.20 / $0.20	2024-06	Gemma Terms of Use
Google	Gemma 3 (multi-size)	1B / 4B / 12B / 27B	128k (12B/27B)	text, vision (4B+)	$0.10-$0.30	2025-03	Gemma Terms of Use
Allen AI	OLMo 2 32B Instruct	32B	4k	text	varies by host	2025-03	Apache 2.0 (fully open: weights + data + training code)
IBM	Granite 3.1 8B Instruct	8B	128k	text, code	$0.10 / $0.10	2024-12	Apache 2.0
NVIDIA	Nemotron-4 340B Instruct	340B	4k	text	varies	2024-06	NVIDIA Open Model License (commercial permitted, no synthetic-data restrictions)

ProviderMeta

ModelLlama 3.1 8B Instruct

Params8B

Context128k

Modalitiestext

Indicative hosted $/1M in/out$0.18 / $0.18 (Together)

Released2024-07

LicenseLlama 3.1 Community License

ProviderMeta

ModelLlama 3.1 70B Instruct

Params70B

Context128k

Modalitiestext

Indicative hosted $/1M in/out$0.88 / $0.88 (Together)

Released2024-07

LicenseLlama 3.1 Community License

ProviderMeta

ModelLlama 3.1 405B Instruct

Params405B

Context128k

Modalitiestext

Indicative hosted $/1M in/out$3.50 / $3.50 (Together)

Released2024-07

LicenseLlama 3.1 Community License

ProviderMeta

ModelLlama 3.2 11B Vision Instruct

Params11B

Context128k

Modalitiestext, vision

Indicative hosted $/1M in/out$0.18 / $0.18

Released2024-09

LicenseLlama 3.2 Community License

ProviderMeta

ModelLlama 3.2 90B Vision Instruct

Params90B

Context128k

Modalitiestext, vision

Indicative hosted $/1M in/out$1.20 / $1.20

Released2024-09

LicenseLlama 3.2 Community License

ProviderMeta

ModelLlama 3.3 70B Instruct

Params70B

Context128k

Modalitiestext

Indicative hosted $/1M in/out$0.88 / $0.88

Released2024-12

LicenseLlama 3.3 Community License

ProviderMeta

ModelLlama 4 Scout (mixture of experts)

Params109B (17B active)

Context10M (advertised)

Modalitiestext, vision

Indicative hosted $/1M in/out$0.18 / $0.59 (Groq)

Released2025-04

LicenseLlama 4 Community License

ProviderMeta

ModelLlama 4 Maverick (mixture of experts)

Params400B (17B active)

Context1M (advertised)

Modalitiestext, vision

Indicative hosted $/1M in/out$0.27 / $0.85

Released2025-04

LicenseLlama 4 Community License

ProviderMistral

ModelMistral Large 2 (2407)

Params123B

Context128k

Modalitiestext, tools

Indicative hosted $/1M in/out$2.00 / $6.00 (la Plateforme)

Released2024-07

LicenseMistral Research License (commercial via paid tier)

ProviderMistral

ModelMistral Small 3

Params24B

Context32k

Modalitiestext, tools

Indicative hosted $/1M in/out$0.10 / $0.30

Released2025-01

LicenseApache 2.0

ProviderMistral

ModelMixtral 8x22B Instruct

Params141B (39B active)

Context65k

Modalitiestext

Indicative hosted $/1M in/out$1.20 / $1.20 (Together)

Released2024-04

LicenseApache 2.0

ProviderMistral

ModelCodestral 25.01

Params22B

Context256k

Modalitiescode

Indicative hosted $/1M in/out$0.30 / $0.90

Released2025-01

LicenseMistral AI Non-Production License (paid for commercial)

ProviderMistral

ModelPixtral Large

Params124B

Context128k

Modalitiestext, vision

Indicative hosted $/1M in/out$2.00 / $6.00

Released2024-11

LicenseMistral Research License

ProviderDeepSeek

ModelDeepSeek-V3

Params671B (37B active)

Context64k (128k via context-cache)

Modalitiestext, code

Indicative hosted $/1M in/out$0.27 / $1.10 (DeepSeek API)

Released2024-12

LicenseDeepSeek License v1.0 (commercial permitted)

ProviderDeepSeek

ModelDeepSeek-R1

Params671B (37B active)

Context64k

Modalitiestext, reasoning

Indicative hosted $/1M in/out$0.55 / $2.19

Released2025-01

LicenseMIT (weights), DeepSeek License (outputs OK for distillation)

ProviderDeepSeek

ModelDeepSeek-R1-Distill-Llama-70B

Params70B

Context128k

Modalitiestext, reasoning

Indicative hosted $/1M in/outvaries by host

Released2025-01

LicenseLlama 3.3 Community License

ProviderAlibaba (Qwen)

ModelQwen2.5 72B Instruct

Params72B

Context131k

Modalitiestext

Indicative hosted $/1M in/out$0.90 / $0.90

Released2024-09

LicenseQwen License (commercial permitted)

ProviderAlibaba (Qwen)

ModelQwen2.5 Coder 32B Instruct

Params32B

Context131k

Modalitiescode

Indicative hosted $/1M in/out$0.80 / $0.80

Released2024-11

LicenseApache 2.0

ProviderAlibaba (Qwen)

ModelQwen2.5-VL 72B Instruct

Params72B

Context131k

Modalitiestext, vision

Indicative hosted $/1M in/out$1.20 / $1.20

Released2025-01

LicenseQwen License

ProviderAlibaba (Qwen)

ModelQwQ-32B-Preview

Params32B

Context32k

Modalitiestext, reasoning

Indicative hosted $/1M in/out$0.15 / $0.60

Released2024-11

LicenseApache 2.0

ProviderAlibaba (Qwen)

ModelQwen3 (preview series)

Params0.6B to 235B (MoE variants)

Context128k

Modalitiestext, reasoning

Indicative hosted $/1M in/outvaries

Released2025-Q2 preview

LicenseApache 2.0 (announced)

ProviderMicrosoft

ModelPhi-4

Params14B

Context16k

Modalitiestext

Indicative hosted $/1M in/out$0.07 / $0.14

Released2024-12

LicenseMIT

ProviderMicrosoft

ModelPhi-3.5-MoE

Params42B (6.6B active)

Context128k

Modalitiestext

Indicative hosted $/1M in/outvaries

Released2024-08

LicenseMIT

ProviderGoogle

ModelGemma 2 27B Instruct

Params27B

Context8k

Modalitiestext

Indicative hosted $/1M in/out$0.27 / $0.27

Released2024-06

LicenseGemma Terms of Use

ProviderGoogle

ModelGemma 2 9B Instruct

Params9B

Context8k

Modalitiestext

Indicative hosted $/1M in/out$0.20 / $0.20

Released2024-06

LicenseGemma Terms of Use

ProviderGoogle

ModelGemma 3 (multi-size)

Params1B / 4B / 12B / 27B

Context128k (12B/27B)

Modalitiestext, vision (4B+)

Indicative hosted $/1M in/out$0.10-$0.30

Released2025-03

LicenseGemma Terms of Use

ProviderAllen AI

ModelOLMo 2 32B Instruct

Params32B

Context4k

Modalitiestext

Indicative hosted $/1M in/outvaries by host

Released2025-03

LicenseApache 2.0 (fully open: weights + data + training code)

ProviderIBM

ModelGranite 3.1 8B Instruct

Params8B

Context128k

Modalitiestext, code

Indicative hosted $/1M in/out$0.10 / $0.10

Released2024-12

LicenseApache 2.0

ProviderNVIDIA

ModelNemotron-4 340B Instruct

Params340B

Context4k

Modalitiestext

Indicative hosted $/1M in/outvaries

Released2024-06

LicenseNVIDIA Open Model License (commercial permitted, no synthetic-data restrictions)

Specialized and embedded models

These are models that are not trying to be a generalist chat assistant. They are tuned for one job — code completion, embeddings, image generation, speech, or on-device inference — and the right shopping question is 'best in class for this specific narrow task,' not 'best overall.'

Code-completion (IDE-grade)

Open + closed

Mistral Codestral 25.01 (22B, 256k context) is the open-weight reference; Qwen2.5-Coder 32B (Apache 2.0) is the strongest fully-permissive option. Closed-tier: Anthropic Claude Sonnet 4 and OpenAI GPT-4o lead the SWE-bench Verified leaderboard as of early 2026. DeepSeek-Coder-V2 (236B MoE, 21B active) remains competitive at a fraction of the cost.

Embeddings

Retrieval & semantic search

OpenAI text-embedding-3-large ($0.13/1M tokens, 3072 dims), Cohere Embed v3 ($0.10/1M, multilingual), Voyage-3 ($0.06/1M, retrieval-tuned), and the open BGE-M3 + Nomic Embed family. Pick by language coverage, dimension count, and whether you need binary/int8 quantization for vector-DB cost.

Image generation

Diffusion + autoregressive

Black Forest Labs FLUX.1 (dev/schnell open weights, pro API), Stability SD 3.5 Large (open), OpenAI DALL-E 3 (closed), Google Imagen 3 (closed), Midjourney v6.1 (closed, no API). FLUX.1 [dev] under FLUX.1 [dev] Non-Commercial License is the de facto open default.

Speech-to-text

ASR

OpenAI Whisper Large v3 remains the open-weight baseline (MIT, 1550M params). Deepgram Nova-2, AssemblyAI Universal-2, and ElevenLabs Scribe are the closed alternatives. NVIDIA Parakeet-TDT 1.1B is the leading open-weight ASR on the Open ASR Leaderboard.

Text-to-speech

TTS

ElevenLabs (closed, voice-cloning leader), OpenAI tts-1-hd ($30/1M chars), Google Chirp/Cloud TTS. Open: Coqui XTTS-v2 (CPML license), Bark by Suno (MIT, lower fidelity). For real-time / low-latency, OpenAI's Realtime API and Cartesia Sonic dominate.

On-device (laptop, phone)

≤8B params, ≤4GB quantized

Llama 3.2 1B/3B (mobile-targeted, 128k context), Phi-4 14B (MIT, runs quantized on 16GB Mac), Gemma 3 4B (vision-capable), Qwen2.5 3B / 7B, Mistral Small 3 24B (workstation tier). Inference via llama.cpp, MLX (Apple), or Ollama.

Reasoning (test-time compute)

Long chain-of-thought

Closed: OpenAI o1 / o3 / o4-mini, Anthropic Claude Sonnet 4 extended thinking, Google Gemini 2.5 Pro Deep Think (preview), DeepSeek-R1 (open). Pattern: pay more output tokens, get harder math / harder code / harder logic. Cost scaling is roughly linear with thinking depth.

Inference accelerators (not models, but related)

Where the model runs matters

Groq (LPU, sub-second Llama 3.1 70B), Cerebras (CS-3, 1800+ tokens/sec on Llama 3.1 70B), SambaNova RDU, Fireworks and Together on NVIDIA. For latency-critical agentic workloads, the host can matter as much as the model choice.

Open-source heroes — the labs that ship weights

The open-weight ecosystem stopped being a curiosity and became the floor of the market sometime in 2024, and by mid-2026 it is the structural backbone of every non-Big-Three serious AI deployment. Six families dominate. Meta Llama is the gravitational center. Llama 3.1 (July 2024) was the moment frontier-tier weights became free; the 405B model effectively matched GPT-4 on common evaluations at launch. Llama 3.3 70B (December 2024) compressed most of 3.1 405B's quality into a 70B chassis. Llama 4 Scout and Maverick (April 2025) moved to mixture-of-experts and pushed advertised context windows to 10M tokens — though real-world retrieval at that length is a separate empirical question. The Llama Community License permits commercial use under a 700M-monthly-active-user cap that catches only the largest companies. Mistral AI (Paris) maintains the European frontier-open lane. Mistral Large 2 (123B, 128k context, July 2024) and Mixtral 8x22B (MoE, Apache 2.0) are the workhorses. Codestral, Pixtral, and Mistral Small 3 fill the niches. The license story is messier than Llama — many Mistral models use the Mistral Research License, which requires a paid agreement for commercial use. Read the file. DeepSeek (Hangzhou) reset the cost frontier in late 2024 and early 2025. DeepSeek-V3 (671B MoE, 37B active, December 2024) trained for roughly $5.6M of GPU time on the published figures, and DeepSeek-R1 (January 2025) released as an open-weight reasoning model that approached o1-class performance at a fraction of the inference cost. The R1-Distill family (Llama-70B, Qwen-32B base) is how most builders actually deploy reasoning today. Qwen (Alibaba) is the most prolific shipper. Qwen2.5 covers 0.5B through 72B, Coder-32B is at parity with closed code models on HumanEval+, Qwen2.5-VL handles vision, and the QwQ preview line ships open-weight reasoning. The Qwen3 series began rolling out in Q2 2025 under Apache 2.0, removing the last licensing friction. Microsoft Phi is the small-model leader. Phi-4 (14B, MIT, December 2024) outperforms many 70B models on reasoning benchmarks per parameter, and the Phi-3.5-MoE family extends that thesis. For on-device and edge deployments, Phi is the first model to evaluate. Google Gemma is the smallest-frontier-lab open family. Gemma 2 (June 2024) and Gemma 3 (March 2025) are derived from Gemini training infrastructure. The Gemma Terms of Use permit commercial use; the 27B variant runs comfortably on a single consumer GPU. The honorable mentions, briefly: Allen AI's OLMo 2 is the only fully-open frontier-class release (weights, data, training code, all under Apache 2.0). IBM Granite is the enterprise-positioned Apache 2.0 family. NVIDIA Nemotron is the largest open model by parameter count. And the long tail — 01.AI Yi, Stability LM, EleutherAI Pythia, MosaicML / Databricks DBRX — continues to ship useful work below the headline tier.

Release timeline — the last 24 months

A compressed chronology of the releases that moved the market. Dates are first-public-availability, not paper-publication.

2024-04
Llama 3 8B / 70B; Mixtral 8x22B; Command R+
Meta ships the first Llama 3 generation. Mistral releases Mixtral 8x22B under Apache 2.0. Cohere ships Command R+ as the first open-weight RAG-tuned 100B-class model.
2024-05
GPT-4o
OpenAI ships its first natively multimodal frontier model — text, vision, and audio in one weight set. Cuts price-per-token by roughly half vs. GPT-4 Turbo.
2024-06
Claude 3.5 Sonnet (first cut); Gemma 2; Nemotron 340B
Anthropic ships 3.5 Sonnet, which would dominate enterprise coding benchmarks for the next nine months. Google releases Gemma 2; NVIDIA releases Nemotron-4 340B as its largest open model.
2024-07
Llama 3.1 405B; Mistral Large 2
Meta's 405B becomes the first openly-released frontier-tier weight set. Mistral Large 2 ships at 123B with 128k context.
2024-09
OpenAI o1-preview; Llama 3.2 (vision); Qwen2.5
o1 introduces test-time compute scaling at the frontier. Llama 3.2 adds vision to the open family. Qwen2.5 ships across seven sizes.
2024-10
Claude 3.5 Sonnet (new); Computer Use
Anthropic ships an upgraded 3.5 Sonnet and the Computer Use beta — first frontier-lab native agentic control.
2024-11
Claude 3.5 Haiku; Qwen2.5-Coder 32B; QwQ-32B
Lower-tier frontier models compress capability further. QwQ becomes the first open-weight reasoning model.
2024-12
OpenAI o1 GA, o3 announced; DeepSeek-V3; Phi-4; Llama 3.3 70B
December 2024 was the densest release month of the cycle. DeepSeek-V3 in particular reset cost-per-token expectations.
2025-01
DeepSeek-R1; o3-mini; Mistral Small 3
DeepSeek-R1 ships as the first open-weight o1-class reasoning model under MIT for the weights. o3-mini follows weeks later.
2025-02
Grok 3; Claude 3.7 Sonnet (with extended thinking)
xAI ships Grok 3 with advertised 1M context. Anthropic adds extended thinking to the Claude line.
2025-03
Gemini 2.5 Pro (preview); Gemma 3; OLMo 2 32B
Google's 2.5 Pro reasserts the long-context lead. Gemma 3 and OLMo 2 32B fill out the open tier.
2025-04
Llama 4 Scout / Maverick; OpenAI o3 GA, o4-mini, GPT-4.1; Gemini 2.5 Flash
Meta moves to MoE at the frontier. OpenAI's o3 reaches general availability. GPT-4.1 reframes the API line.
2025-05
Claude Opus 4 and Claude Sonnet 4
Anthropic's Claude 4 generation ships, with Sonnet 4 offering a 1M-token context beta.
2025-06-onward
Continued open-weight pressure; reasoning-on-everything; preview models that may or may not graduate
The pattern from mid-2025 into 2026: every frontier model gets a reasoning variant, every open lab compresses costs, and the gap between closed and open narrows on benchmarks but widens on agentic capability. For anything released after this page's June 2026 refresh, check the provider's pricing page directly.

Cost discipline — what to actually pay attention to

Model selection in 2026 is rarely 'which model is smartest.' For 80% of production workloads it is 'which model is good enough at the lowest sustainable unit cost.' Apply the following filters in order before you commit to a SKU.

Output tokens cost 3-5x input tokens at every closed frontier lab. If your workload generates long completions, the output column is the one that matters — input pricing is misleading marketing.
Cached input is real. Anthropic, OpenAI, and Google all offer prompt-caching discounts (often 50-90% off cached portions). If your system prompt is long and static, build for caching from day one.
Batch API is 50% off at every lab that offers it (OpenAI, Anthropic, Google). For non-real-time workloads — overnight evals, document processing, content backfill — this is free money.
Mixture-of-experts pricing is per-active-parameter, not per-total-parameter. DeepSeek-V3 is 671B total but bills like a 37B model. This is why open-weight MoE has eaten the cost frontier.
Self-hosted breakeven on a frontier-open model lands somewhere around 50-200M monthly tokens depending on the model and the GPU rental rate. Below that, hosted APIs win on TCO; above it, your own infrastructure starts to pay back.
Reasoning models are not always worth it. o1 and R1 charge for the hidden chain-of-thought tokens too. For tasks that do not need multi-step reasoning, a cheaper non-reasoning model gets you the same answer for a tenth of the cost.
Context window pricing is non-linear. Gemini 2.5 Pro and Claude Sonnet 4 both have higher per-token rates above 200k. If you can summarize-and-cache instead of dumping the full corpus, you usually should.

What this page does not cover

Three deliberate omissions, named in the open. First — we do not list every regional or domain-specific model. Yi, ERNIE, Hunyuan, Baichuan, Sber GigaChat, Mistral's per-country variants, and the long tail of fine-tunes on Hugging Face are real, but cataloguing all 50,000+ public model checkpoints is not useful. We focus on the families with active maintenance, English-first documentation, and a clear license. Second — we do not benchmark. This page is a price-and-availability index. For capability comparisons, look at LMSYS Chatbot Arena, the Hugging Face Open LLM Leaderboard, SWE-bench Verified, AIME, GPQA Diamond, and Aider Polyglot — and prefer multiple benchmarks to any single number. Third — we do not predict. If a lab has not shipped a SKU under a name, that name is not in our tables. We will re-version the page when reality moves.

How to keep this current

This page is maintained as a snapshot, not a live feed. The honest way to use it is: scan the structure to understand the landscape, then click through to each provider's own pricing page before you quote a number in a contract, a proposal, or a research paper. The canonical pricing pages, all linked in the citations below, are: anthropic.com/pricing, openai.com/api/pricing, ai.google.dev/pricing, x.ai/api, mistral.ai/pricing, deepseek.com/api-pricing, cohere.com/pricing, ai21.com/pricing, together.ai/pricing, fireworks.ai/pricing, groq.com/pricing, and the Hugging Face model pages for each open-weight release. For open-weight licenses specifically, always download the LICENSE file from the original release repo — summaries on third-party sites have been wrong often enough to matter. The community trackers worth bookmarking alongside this page: Artificial Analysis (artificialanalysis.ai) for ongoing price-and-latency benchmarks, the LMSYS Chatbot Arena Leaderboard for capability rankings, and Epoch AI's model database for training-compute and parameter-count history. None of those are a substitute for the provider's own page, but they catch errors and stale rows faster than any single static document can.

Sources

[01]
Anthropic publishes per-model token pricing for Claude 3.5 Sonnet, 3.5 Haiku, Opus 4, and Sonnet 4, including the 1M-token context beta tier.
anthropic.com/pricing
[02]
Claude 3.5 Sonnet was released June 20, 2024, with the upgraded snapshot released October 22, 2024.
anthropic.com/news/claude-3-5-sonnet
[03]
Anthropic announced Claude Opus 4 and Claude Sonnet 4 in May 2025 with extended thinking and 1M context beta for Sonnet 4.
anthropic.com/news/claude-4
[04]
OpenAI publishes current per-model API pricing for GPT-4o, GPT-4o-mini, o1, o1-mini, o3, o3-mini, and o4-mini.
openai.com/api/pricing
[05]
GPT-4o was announced May 13, 2024 as OpenAI's first natively multimodal frontier model.
openai.com/index/hello-gpt-4o
[06]
OpenAI o1 preview was released September 12, 2024 with full o1 reaching GA in December 2024.
openai.com/index/learning-to-reason-with-llms
[07]
OpenAI o3 and o4-mini were released April 16, 2025.
openai.com/index/introducing-o3-and-o4-mini
[08]
Google AI Studio and Vertex AI publish pricing for Gemini 1.5 Pro, Gemini 2.5 Pro, and Gemini 2.5 Flash with tiered pricing above 200k tokens.
ai.google.dev/pricing
[09]
Gemini 2.5 Pro was announced in March 2025 as a thinking-by-default frontier model.
blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025
[10]
xAI publishes Grok 2 and Grok 3 API pricing with stated context windows.
x.ai/api
[11]
Llama 3.1 was released July 23, 2024 in 8B, 70B, and 405B sizes with 128k context.
ai.meta.com/blog/meta-llama-3-1
[12]
Meta announced Llama 4 Scout and Maverick on April 5, 2025 with mixture-of-experts architecture.
ai.meta.com/blog/llama-4-multimodal-intelligence
[13]
The Llama 3.1 Community License governs commercial use with a 700M monthly active user threshold.
llama.meta.com/llama3_1/license
[14]
Mistral Large 2 was released July 24, 2024 at 123B parameters with 128k context.
mistral.ai/news/mistral-large-2407
[15]
Codestral 25.01 was released January 2025 at 22B parameters with 256k context.
mistral.ai/news/codestral-2501
[16]
DeepSeek-V3 was released December 26, 2024 as a 671B-total / 37B-active mixture-of-experts model.
api-docs.deepseek.com/news/news1226
[17]
DeepSeek-R1 was released January 20, 2025 under the MIT license for the model weights.
api-docs.deepseek.com/news/news250120
[18]
The DeepSeek-V3 technical report documents the architecture, training, and reported $5.576M compute cost.
arxiv.org/abs/2412.19437
[19]
The DeepSeek-R1 technical report documents the RL-based reasoning training pipeline and benchmark results.
arxiv.org/abs/2501.12948
[20]
Qwen2.5 was released September 18, 2024 across seven sizes from 0.5B to 72B parameters.
qwenlm.github.io/blog/qwen2.5
[21]
QwQ-32B-Preview was released November 27, 2024 as an open-weight reasoning model under Apache 2.0.
qwenlm.github.io/blog/qwq-32b-preview
[22]
Phi-4 was released December 12, 2024 at 14B parameters under the MIT license.
microsoft.com/en-us/research/blog/phi-4-technical-report
[23]
Gemma 3 was released March 12, 2025 in 1B, 4B, 12B, and 27B variants.
blog.google/technology/developers/gemma-3
[24]
OLMo 2 32B was released March 2025 as a fully-open release including weights, data, and training code under Apache 2.0.
allenai.org/blog/olmo2-32B
[25]
LMSYS Chatbot Arena maintains a community-voted leaderboard ranking deployed models across labs.
lmsys.org/blog/chatbot-arena

Keep reading

AI research notes →Learn — playbooks →OrangeBox local AI stack →Tools — builder workbench →B00KMakor — write with models →Compare — Claude vs GPT vs Gemini →Compare — open vs closed weights →Learn — model selection guide →

The shipping AI model tracker

How to read this tracker

Frontier closed-weight models

On 'GPT-5', 'Claude 4.5', 'Gemini 3', and other models that may or may not exist by the time you read this

Frontier open-weight models

Specialized and embedded models

Code-completion (IDE-grade)

Embeddings

Image generation

Speech-to-text

Text-to-speech

On-device (laptop, phone)

Reasoning (test-time compute)

Inference accelerators (not models, but related)

Open-source heroes — the labs that ship weights

Release timeline — the last 24 months

Llama 3 8B / 70B; Mixtral 8x22B; Command R+

GPT-4o

Claude 3.5 Sonnet (first cut); Gemma 2; Nemotron 340B

Llama 3.1 405B; Mistral Large 2

OpenAI o1-preview; Llama 3.2 (vision); Qwen2.5

Claude 3.5 Sonnet (new); Computer Use

Claude 3.5 Haiku; Qwen2.5-Coder 32B; QwQ-32B

OpenAI o1 GA, o3 announced; DeepSeek-V3; Phi-4; Llama 3.3 70B

DeepSeek-R1; o3-mini; Mistral Small 3

Grok 3; Claude 3.7 Sonnet (with extended thinking)

Gemini 2.5 Pro (preview); Gemma 3; OLMo 2 32B

Llama 4 Scout / Maverick; OpenAI o3 GA, o4-mini, GPT-4.1; Gemini 2.5 Flash

Claude Opus 4 and Claude Sonnet 4

Continued open-weight pressure; reasoning-on-everything; preview models that may or may not graduate

Cost discipline — what to actually pay attention to

What this page does not cover

How to keep this current

Sources

Keep reading