The open-weights model index
What you can actually download, fine-tune, and ship — license-honest, hype-free
The index — open-weights families worth knowing
What "open weights" actually means
License gotchas worth flagging
Three landmines that catch teams repeatedly. (1) The Llama Community License has a clause that triggers when a deploying entity has more than 700 million monthly active users — at that point you owe Meta a separate license. Below that ceiling, commercial use is permitted with attribution. Most teams are fine. Some are not, and the threshold is per-organization, not per-product. (2) Distillation: many community licenses restrict using the model's outputs to train a competing foundation model. This means generating a synthetic dataset with Llama 3 and using it to train a from-scratch competitor is contractually prohibited under Meta's license, even though the outputs themselves are not copyrighted. (3) Acceptable use policies: Llama, Gemma, and Falcon all attach AUPs that prohibit certain applications (weapons of mass destruction, large-scale surveillance, content sexualizing minors, etc.). These are enforceable contract terms, not just guidelines. Apache 2.0 and MIT models have no such restrictions in the license itself, but you are still bound by applicable law.
Frontier open-weights families — the four to actually pay attention to
Llama (Meta)
Llama 3.1 · Llama 3.2 · Llama 3.3 70B
The most-deployed open-weights family. Strong English performance, broad fine-tune ecosystem, 128k context on the 3.1/3.3 generation, multimodal vision variants in 3.2. License is custom (Llama Community License), not Apache, and the 700M MAU clause matters at scale. Llama 4 has not, as of June 2026 best-effort, shipped to public weights — check Meta's official announcements. If you are choosing a default open-weights base for an English-heavy product and you are below the MAU threshold, Llama 3.x is the boring correct answer.
DeepSeek (DeepSeek-AI)
DeepSeek-V3 · DeepSeek-R1 · DeepSeek-Coder-V2
Mixture-of-experts architecture with 671B total parameters and roughly 37B activated per token, which gives strong quality at inference cost closer to a dense ~70B. R1 was the first widely-distributed open-weights reasoning model and was released under MIT license — an unusually permissive choice. V3 ships under DeepSeek's own license with commercial use permitted under documented restrictions. Strong on code and math. Hosted inference is unusually cheap; self-hosting the full MoE requires serious GPU memory.
Qwen (Alibaba)
Qwen 2.5 · Qwen 2.5-Coder · Qwen 2.5-VL
Most of the Qwen 2.5 lineup ships Apache 2.0 (excluding the 72B variant which uses the Qwen License with commercial-use terms). Strong multilingual performance, especially Chinese-English, and the Coder variants are competitive with closed models on HumanEval and similar code benchmarks. Range from 0.5B (on-device) through 72B. Good default for a multilingual product or a small-model experiment.
Mistral (Mistral AI)
Mixtral 8x7B / 8x22B · Mistral Large 2 · Codestral · Mistral 7B
Bifurcated portfolio: the older Mixtral models and Mistral 7B are Apache 2.0 and freely usable commercially. The newer Mistral Large 2 and Codestral are research-license-only — downloadable from Hugging Face for evaluation, but commercial deployment requires a paid Mistral license. Mistral was the first major lab to popularize MoE in open weights. Strong European data coverage.
Small-model picks — under 14B parameters
How we got here — a short timeline
2022-07
BLOOM (BigScience)
176B-parameter multilingual model from the BigScience workshop coordinated by Hugging Face. First model of this scale released under a Responsible AI License with explicit use-case restrictions. Set a baseline that openness could include conditions.
2023-02
LLaMA 1 (Meta · research-only leak)
Meta releases LLaMA 1 under a research-only license. Weights leak via 4chan within a week. The leak forces the industry to confront the gap between intended access and actual access, and arguably accelerates the next year of open releases.
2023-07
Llama 2
Meta releases Llama 2 with a custom Community License permitting commercial use below 700M MAU. This is the moment open-weights becomes commercially viable at scale — the entire fine-tune ecosystem (Vicuna, WizardLM, Alpaca derivatives, etc.) coalesces around it.
2023-09
Mistral 7B
Mistral AI ships a 7B dense model under Apache 2.0. Outperforms Llama 2 13B on many benchmarks and ignites the small-model serious-quality thread that still runs.
2023-12
Mixtral 8x7B
Mistral releases the first widely-adopted open MoE. Apache 2.0. Demonstrates that mixture-of-experts can be shipped openly, not just kept behind closed APIs.
2024-04
Llama 3 · Phi-3 · OLMo
Meta ships Llama 3 8B/70B with 8k context. Microsoft ships Phi-3 demonstrating small-model strength via curated data. AI2 ships OLMo with full data and training-code disclosure. The 'open weights' field stratifies into permissive-but-closed-data, permissive-with-AUP, and fully-open-everything tiers.
2024-07
Llama 3.1 · 405B
Meta releases Llama 3.1 8B/70B/405B with 128k context. The 405B variant is the largest openly-released dense model to date and shifts the conversation about what 'frontier-class open weights' looks like.
2024-12
DeepSeek-V3 · R1 (early 2025)
DeepSeek releases V3 in December 2024, followed by R1 in January 2025 under MIT license. R1 is the first widely-distributed open-weights reasoning model and the release shifts both the cost-curve and the political conversation around open-weights releases.
2025-2026
Continued releases — best-effort tracking
Qwen, Mistral, Meta, DeepSeek, IBM, Microsoft, AI2, and 01.AI continue iterating. Llama 4 timing and Llama 3.3 70B (released late 2024) are the moving pieces — check the provider model cards for current state. This page reflects what is verifiably released as of mid-2026 best-effort.
Choosing a base — a short decision tree
- You need maximum permissive license (Apache 2.0 or MIT) and English-heavy use: Mistral 7B, Mixtral 8x7B, Phi-3.5, Granite 3, OLMo 2, Qwen 2.5 (most sizes).
- You need maximum permissive license and multilingual coverage (incl. Chinese): Qwen 2.5 (Apache 2.0 sizes) or Yi 1.5.
- You need 128k context and you are below 700M MAU: Llama 3.1 8B / 70B / 405B or Llama 3.3 70B.
- You need on-device inference under 4B parameters: Llama 3.2 1B/3B, Phi-3.5-mini, Gemma 2 2B, Qwen 2.5 0.5B / 1.5B / 3B.
- You need open reasoning model with chain-of-thought: DeepSeek-R1 (MIT-licensed weights) — distill to a smaller dense model for production.
- You need code-specific: Qwen 2.5-Coder, DeepSeek-Coder-V2, or Codestral (research-only — buy a Mistral license for prod).
- You need full reproducibility (data + code + weights): OLMo 2 or Pythia. Nothing else in the list publishes the data mix at OLMo's level.
- You are a >700M MAU company and want a frontier open base: Apache-2.0 path (Mixtral 8x22B, Qwen 2.5 72B under Qwen License with commercial terms, or DeepSeek-V3 under its own commercial-permissive license) or negotiate directly with Meta.
Numbers we deliberately did not invent
We did not publish specific MMLU, GPQA, or HumanEval scores in this index. Those numbers move with each fine-tune, each eval harness version, and each prompt-formatting choice — comparing 'official' numbers across providers using different harnesses is misleading. For current benchmark performance, consult: the model card on Hugging Face, the provider's release post, and an independent harness run (lm-evaluation-harness from EleutherAI is the most-cited). The official Hugging Face Open LLM Leaderboard was retired in mid-2024; community-maintained successors exist but rotate. Treat any single leaderboard as one data point, not the verdict.
Honest limits of this page
Sources
- [01]
Meta Llama 3.1 release post documents the 8B / 70B / 405B sizes and 128k context window.
ai.meta.com/blog/meta-llama-3-1/
- [02]
The Llama 3.1 Community License contains the 700 million monthly active users threshold for separate commercial licensing.
llama.meta.com/llama3_1/license/
- [03]
Meta Llama 3.2 release post documents the 1B / 3B text and 11B / 90B vision variants and the EU restriction for multimodal use.
ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/
- [04]
Mistral AI announces Mistral Large 2 at 123B parameters with 128k context under the Mistral Research License.
mistral.ai/news/mistral-large-2407/
- [05]
Mistral AI announces Mixtral 8x7B under Apache 2.0 license as the first widely-adopted open MoE.
mistral.ai/news/mixtral-of-experts/
- [06]
Mistral AI ships Codestral 22B under the Mistral Non-Production License restricting commercial deployment.
mistral.ai/news/codestral/
- [07]
DeepSeek-V3 technical report documents the 671B-parameter MoE architecture with roughly 37B activated parameters per token.
arxiv.org/abs/2412.19437
- [08]
DeepSeek-R1 paper documents the open reasoning model release under MIT license.
arxiv.org/abs/2501.12948
- [09]
Alibaba's Qwen 2.5 release post documents the 0.5B through 72B size range and the Apache 2.0 license for most variants.
qwenlm.github.io/blog/qwen2.5/
- [10]
Gemma Terms of Use document the custom permissive license with prohibited-use clauses for Google's Gemma models.
ai.google.dev/gemma/terms
- [11]
Microsoft Phi-3 technical report documents the 3.8B / 7B / 14B Phi-3 family release under MIT license.
arxiv.org/abs/2404.14219
- [12]
Microsoft Phi-4 technical report documents the 14B dense reasoning-tuned model.
arxiv.org/abs/2412.08905
- [13]
IBM Granite 3 model family is released under Apache 2.0 with enterprise-targeted licensing terms.
ibm.com/granite
- [14]
01.AI's Yi paper documents the 6B / 9B / 34B Yi family and bilingual training corpus.
arxiv.org/abs/2403.04652
- [15]
TII's Falcon 2 release documents the 11B model under the TII Falcon License 2.0.
falconllm.tii.ae/falcon-2.html
- [16]
Stability AI's Stable LM 2 release documents the 1.6B and 12B variants under the Stability AI Community License.
stability.ai/news/introducing-stable-lm-2
- [17]
AI2 OLMo releases publish weights, training data (Dolma), training code, and recipe under Apache 2.0 for full reproducibility.
allenai.org/olmo
- [18]
EleutherAI Pythia paper documents the 70M through 12B suite released with intermediate checkpoints for interpretability research under Apache 2.0.
arxiv.org/abs/2304.01373
- [19]
BLOOM paper documents the 176B multilingual model released under the BigScience RAIL License v1.0.
arxiv.org/abs/2211.05100
- [20]
Hugging Face documents the retirement of the original Open LLM Leaderboard.
huggingface.co/blog/open-llm-leaderboard-archive
- [21]
EleutherAI's lm-evaluation-harness is the most widely-cited independent benchmark harness for open-weights LLM evaluation.
github.com/EleutherAI/lm-evaluation-harness
- [22]
The Open Source Definition (OSI) describes the conditions for software to be considered open-source, which most current open-weights model licenses do not meet.
opensource.org/osd