An open dark leather folio with a bio-cyan bookmark — open-weight models are the field's library.

The open-weights model index

What you can actually download, fine-tune, and ship — license-honest, hype-free

Open-weights is not open-source. It is a narrower, more honest term: the trained parameters are downloadable, but the training data, training code, alignment recipe, and sometimes the right to use the outputs for commercial purposes — those vary release by release. This page is a reference index of the notable open-weights families as of mid-2026 best-effort. We list parameter counts, native context windows, license families, what each model is meant for, and where the license has teeth. We do not rank by leaderboard. Leaderboards rotate. Licenses persist. A model that scores two points higher on MMLU is irrelevant if its license bars your use case, and a 7B model under Apache 2.0 will out-ship a 70B model under a custom community license for most teams who actually have to deploy something. So the structure here is: index first (params, context, license, base, recommended use), then a plain-language section on what the licenses actually let you do, then a short timeline of how we got here. Where a fact is moving fast — pricing on hosted endpoints, exact benchmark numbers, whether a specific Llama 4 variant has shipped — we mark it as a best-effort snapshot and point you to the provider's official model card or license file. Treat those as the source of truth. Treat this page as the map, not the territory. A note on what is deliberately excluded: closed-weights commercial APIs (GPT-4-class, Claude, Gemini Pro) are not on this list even when they are excellent, because you cannot download them. Models released only via gated waitlists with no actual weights download are also excluded. The bar for inclusion is: weights publicly downloadable from Hugging Face or an equivalent registry, under a license that permits at least research use, with a model card that names the parameter count.

The index — open-weights families worth knowing

Family	Sizes (params)	Native context	License family	Recommended use
Llama 3.1 (Meta)	8B · 70B · 405B	128k	Llama 3.1 Community License (custom · permissive with caveats)	General-purpose assistant, fine-tune base, English-heavy
Llama 3.2 (Meta)	1B · 3B (text) · 11B · 90B (vision)	128k	Llama 3.2 Community License (adds restrictions on EU vision use)	On-device (1B/3B), multimodal (11B/90B)
Mistral Large 2 (Mistral AI)	123B	128k	Mistral Research License (research only · commercial via Mistral)	Research baseline, evaluation, distillation source
Mixtral 8x7B / 8x22B (Mistral AI)	47B / 141B active-routed MoE	32k / 64k	Apache 2.0	Commercial deployment, mixture-of-experts research
Codestral (Mistral AI)	22B	32k	Mistral Non-Production License (research/eval only — commercial requires license)	Code-completion research; do not ship commercially without a Mistral commercial license
DeepSeek-V3 (DeepSeek)	671B total · ~37B active (MoE)	128k	DeepSeek License (permissive, commercial allowed with use restrictions)	Strong general + code; cost-efficient inference
DeepSeek-R1 (DeepSeek)	671B total · ~37B active (MoE)	128k	MIT License (model weights)	Reasoning research; the first widely-distributed open reasoning model
Qwen 2.5 (Alibaba)	0.5B · 1.5B · 3B · 7B · 14B · 32B · 72B	128k (some variants 32k)	Apache 2.0 (most sizes) · Qwen License (72B)	Multilingual (esp. Chinese-English), code variants strong
Gemma 2 (Google)	2B · 9B · 27B	8k	Gemma Terms of Use (custom permissive, prohibits certain use cases)	Efficient inference, research distillation
Phi-3 / Phi-3.5 (Microsoft)	3.8B (mini) · 7B (small) · 14B (medium) · MoE 41B	4k–128k by variant	MIT License	Small-model strength, on-device, fine-tune base
Phi-4 (Microsoft)	14B	16k	MIT License	Reasoning-tuned dense model, late-2024 release
Granite 3 (IBM)	2B · 8B (dense) · 1B · 3B MoE	4k–128k	Apache 2.0	Enterprise-friendly licensing, code variants, time-series variant
Yi 1.5 (01.AI)	6B · 9B · 34B	4k–200k by variant	Apache 2.0 (weights) · Yi License (some commercial terms)	Bilingual (Chinese-English), long-context variants
Falcon 2 / Falcon 3 (TII)	11B · 1B–10B (Falcon 3)	8k–32k	TII Falcon License 2.0 (permissive, includes acceptable-use policy)	UAE-hosted base model, multilingual
Stable LM 2 (Stability AI)	1.6B · 12B	4k	Stability AI Community License (membership tiering for commercial use)	Research, small-model experimentation
OLMo / OLMo 2 (AI2)	1B · 7B · 13B	4k	Apache 2.0 (weights, code, data, recipe all open)	Reproducible research — the most truly open of the batch
Pythia (EleutherAI)	70M · 160M · 410M · 1B · 1.4B · 2.8B · 6.9B · 12B	2k	Apache 2.0	Interpretability research, training-dynamics studies
BLOOM (BigScience)	560M · 1.1B · 1.7B · 3B · 7.1B · 176B	2k	BigScience RAIL License v1.0 (Responsible AI license with use-case restrictions)	Multilingual baseline (46 natural + 13 programming languages), historical reference

FamilyLlama 3.1 (Meta)

Sizes (params)8B · 70B · 405B

Native context128k

License familyLlama 3.1 Community License (custom · permissive with caveats)

Recommended useGeneral-purpose assistant, fine-tune base, English-heavy

FamilyLlama 3.2 (Meta)

Sizes (params)1B · 3B (text) · 11B · 90B (vision)

Native context128k

License familyLlama 3.2 Community License (adds restrictions on EU vision use)

Recommended useOn-device (1B/3B), multimodal (11B/90B)

FamilyMistral Large 2 (Mistral AI)

Sizes (params)123B

Native context128k

License familyMistral Research License (research only · commercial via Mistral)

Recommended useResearch baseline, evaluation, distillation source

FamilyMixtral 8x7B / 8x22B (Mistral AI)

Sizes (params)47B / 141B active-routed MoE

Native context32k / 64k

License familyApache 2.0

Recommended useCommercial deployment, mixture-of-experts research

FamilyCodestral (Mistral AI)

Sizes (params)22B

Native context32k

License familyMistral Non-Production License (research/eval only — commercial requires license)

Recommended useCode-completion research; do not ship commercially without a Mistral commercial license

FamilyDeepSeek-V3 (DeepSeek)

Sizes (params)671B total · ~37B active (MoE)

Native context128k

License familyDeepSeek License (permissive, commercial allowed with use restrictions)

Recommended useStrong general + code; cost-efficient inference

FamilyDeepSeek-R1 (DeepSeek)

Sizes (params)671B total · ~37B active (MoE)

Native context128k

License familyMIT License (model weights)

Recommended useReasoning research; the first widely-distributed open reasoning model

FamilyQwen 2.5 (Alibaba)

Sizes (params)0.5B · 1.5B · 3B · 7B · 14B · 32B · 72B

Native context128k (some variants 32k)

License familyApache 2.0 (most sizes) · Qwen License (72B)

Recommended useMultilingual (esp. Chinese-English), code variants strong

FamilyGemma 2 (Google)

Sizes (params)2B · 9B · 27B

Native context8k

License familyGemma Terms of Use (custom permissive, prohibits certain use cases)

Recommended useEfficient inference, research distillation

FamilyPhi-3 / Phi-3.5 (Microsoft)

Sizes (params)3.8B (mini) · 7B (small) · 14B (medium) · MoE 41B

Native context4k–128k by variant

License familyMIT License

Recommended useSmall-model strength, on-device, fine-tune base

FamilyPhi-4 (Microsoft)

Sizes (params)14B

Native context16k

License familyMIT License

Recommended useReasoning-tuned dense model, late-2024 release

FamilyGranite 3 (IBM)

Sizes (params)2B · 8B (dense) · 1B · 3B MoE

Native context4k–128k

License familyApache 2.0

Recommended useEnterprise-friendly licensing, code variants, time-series variant

FamilyYi 1.5 (01.AI)

Sizes (params)6B · 9B · 34B

Native context4k–200k by variant

License familyApache 2.0 (weights) · Yi License (some commercial terms)

Recommended useBilingual (Chinese-English), long-context variants

FamilyFalcon 2 / Falcon 3 (TII)

Sizes (params)11B · 1B–10B (Falcon 3)

Native context8k–32k

License familyTII Falcon License 2.0 (permissive, includes acceptable-use policy)

Recommended useUAE-hosted base model, multilingual

FamilyStable LM 2 (Stability AI)

Sizes (params)1.6B · 12B

Native context4k

License familyStability AI Community License (membership tiering for commercial use)

Recommended useResearch, small-model experimentation

FamilyOLMo / OLMo 2 (AI2)

Sizes (params)1B · 7B · 13B

Native context4k

License familyApache 2.0 (weights, code, data, recipe all open)

Recommended useReproducible research — the most truly open of the batch

FamilyPythia (EleutherAI)

Sizes (params)70M · 160M · 410M · 1B · 1.4B · 2.8B · 6.9B · 12B

Native context2k

License familyApache 2.0

Recommended useInterpretability research, training-dynamics studies

FamilyBLOOM (BigScience)

Sizes (params)560M · 1.1B · 1.7B · 3B · 7.1B · 176B

Native context2k

License familyBigScience RAIL License v1.0 (Responsible AI license with use-case restrictions)

Recommended useMultilingual baseline (46 natural + 13 programming languages), historical reference

What "open weights" actually means

Open weights means you can download the trained model parameters. That is the minimum. Beyond that, every claim about "openness" has fine print, and the fine print is where you get burned. There are roughly four tiers of openness in practice. First, fully open: weights, training code, training data, and the training recipe are all published under a permissive license. OLMo from AI2 is the canonical example — they publish the data mix (Dolma), the training code (OLMo-core), the checkpoints, and the evaluation suite. Pythia from EleutherAI similarly publishes intermediate checkpoints for interpretability work. Second, open weights with open code: the model file is downloadable, the inference code is on GitHub, but the training data and recipe are not disclosed. Mixtral 8x7B, Qwen 2.5, most Granite releases sit here. Third, open weights under a custom community license: Meta's Llama 3.x family, Google's Gemma 2, Stability's models. The weights download, but the license adds use restrictions — sometimes geographic (Llama 3.2's vision models are restricted in the EU for the model provider, not the deployer), sometimes user-count gated (Llama's >700M MAU clause requires a separate commercial license), sometimes acceptable-use clauses that prohibit specific verticals. Fourth, research-only weights: Mistral Large 2 and Codestral fall here. The weights are downloadable from Hugging Face but the license says non-commercial / research-only, and commercial use requires a paid license from Mistral. The practical lesson: read the LICENSE file. Not the marketing page. Not the README. The LICENSE file in the model repo. If it says "Llama 3.1 Community License," pull up Meta's PDF and search it for your use case. If it says "Mistral Research License," you cannot ship a paid product on it. If it says "Apache 2.0" or "MIT," you have the broadest latitude, but you still owe attribution and you should still read it.

License gotchas worth flagging

Three landmines that catch teams repeatedly. (1) The Llama Community License has a clause that triggers when a deploying entity has more than 700 million monthly active users — at that point you owe Meta a separate license. Below that ceiling, commercial use is permitted with attribution. Most teams are fine. Some are not, and the threshold is per-organization, not per-product. (2) Distillation: many community licenses restrict using the model's outputs to train a competing foundation model. This means generating a synthetic dataset with Llama 3 and using it to train a from-scratch competitor is contractually prohibited under Meta's license, even though the outputs themselves are not copyrighted. (3) Acceptable use policies: Llama, Gemma, and Falcon all attach AUPs that prohibit certain applications (weapons of mass destruction, large-scale surveillance, content sexualizing minors, etc.). These are enforceable contract terms, not just guidelines. Apache 2.0 and MIT models have no such restrictions in the license itself, but you are still bound by applicable law.

Frontier open-weights families — the four to actually pay attention to

Llama (Meta)

Llama 3.1 · Llama 3.2 · Llama 3.3 70B

The most-deployed open-weights family. Strong English performance, broad fine-tune ecosystem, 128k context on the 3.1/3.3 generation, multimodal vision variants in 3.2. License is custom (Llama Community License), not Apache, and the 700M MAU clause matters at scale. Llama 4 has not, as of June 2026 best-effort, shipped to public weights — check Meta's official announcements. If you are choosing a default open-weights base for an English-heavy product and you are below the MAU threshold, Llama 3.x is the boring correct answer.

DeepSeek (DeepSeek-AI)

DeepSeek-V3 · DeepSeek-R1 · DeepSeek-Coder-V2

Mixture-of-experts architecture with 671B total parameters and roughly 37B activated per token, which gives strong quality at inference cost closer to a dense ~70B. R1 was the first widely-distributed open-weights reasoning model and was released under MIT license — an unusually permissive choice. V3 ships under DeepSeek's own license with commercial use permitted under documented restrictions. Strong on code and math. Hosted inference is unusually cheap; self-hosting the full MoE requires serious GPU memory.

Qwen (Alibaba)

Qwen 2.5 · Qwen 2.5-Coder · Qwen 2.5-VL

Most of the Qwen 2.5 lineup ships Apache 2.0 (excluding the 72B variant which uses the Qwen License with commercial-use terms). Strong multilingual performance, especially Chinese-English, and the Coder variants are competitive with closed models on HumanEval and similar code benchmarks. Range from 0.5B (on-device) through 72B. Good default for a multilingual product or a small-model experiment.

Mistral (Mistral AI)

Mixtral 8x7B / 8x22B · Mistral Large 2 · Codestral · Mistral 7B

Bifurcated portfolio: the older Mixtral models and Mistral 7B are Apache 2.0 and freely usable commercially. The newer Mistral Large 2 and Codestral are research-license-only — downloadable from Hugging Face for evaluation, but commercial deployment requires a paid Mistral license. Mistral was the first major lab to popularize MoE in open weights. Strong European data coverage.

Small-model picks — under 14B parameters

Model	Params	License	Notable property
Llama 3.2 1B / 3B	1B · 3B	Llama 3.2 Community	Tuned for on-device; 128k context retained
Phi-3.5-mini	3.8B	MIT	Punches well above weight on reasoning benchmarks for size
Phi-4	14B	MIT	Microsoft's late-2024 dense reasoning model
Gemma 2 2B / 9B	2B · 9B	Gemma Terms	Tight inference footprint, 8k context
Qwen 2.5 0.5B / 1.5B / 3B / 7B	0.5B–7B	Apache 2.0	Multilingual, strong code variants
Granite 3 2B / 8B	2B · 8B	Apache 2.0	Enterprise-clean license, IBM-backed
OLMo 2 7B / 13B	7B · 13B	Apache 2.0 (fully open)	Reproducible research-grade release
Mistral 7B (v0.3)	7.3B	Apache 2.0	Battle-tested fine-tune base, broad ecosystem

ModelLlama 3.2 1B / 3B

Params1B · 3B

LicenseLlama 3.2 Community

Notable propertyTuned for on-device; 128k context retained

ModelPhi-3.5-mini

Params3.8B

LicenseMIT

Notable propertyPunches well above weight on reasoning benchmarks for size

ModelPhi-4

Params14B

LicenseMIT

Notable propertyMicrosoft's late-2024 dense reasoning model

ModelGemma 2 2B / 9B

Params2B · 9B

LicenseGemma Terms

Notable propertyTight inference footprint, 8k context

ModelQwen 2.5 0.5B / 1.5B / 3B / 7B

Params0.5B–7B

LicenseApache 2.0

Notable propertyMultilingual, strong code variants

ModelGranite 3 2B / 8B

Params2B · 8B

LicenseApache 2.0

Notable propertyEnterprise-clean license, IBM-backed

ModelOLMo 2 7B / 13B

Params7B · 13B

LicenseApache 2.0 (fully open)

Notable propertyReproducible research-grade release

ModelMistral 7B (v0.3)

Params7.3B

LicenseApache 2.0

Notable propertyBattle-tested fine-tune base, broad ecosystem

How we got here — a short timeline

2022-07
BLOOM (BigScience)
176B-parameter multilingual model from the BigScience workshop coordinated by Hugging Face. First model of this scale released under a Responsible AI License with explicit use-case restrictions. Set a baseline that openness could include conditions.
2023-02
LLaMA 1 (Meta · research-only leak)
Meta releases LLaMA 1 under a research-only license. Weights leak via 4chan within a week. The leak forces the industry to confront the gap between intended access and actual access, and arguably accelerates the next year of open releases.
2023-07
Llama 2
Meta releases Llama 2 with a custom Community License permitting commercial use below 700M MAU. This is the moment open-weights becomes commercially viable at scale — the entire fine-tune ecosystem (Vicuna, WizardLM, Alpaca derivatives, etc.) coalesces around it.
2023-09
Mistral 7B
Mistral AI ships a 7B dense model under Apache 2.0. Outperforms Llama 2 13B on many benchmarks and ignites the small-model serious-quality thread that still runs.
2023-12
Mixtral 8x7B
Mistral releases the first widely-adopted open MoE. Apache 2.0. Demonstrates that mixture-of-experts can be shipped openly, not just kept behind closed APIs.
2024-04
Llama 3 · Phi-3 · OLMo
Meta ships Llama 3 8B/70B with 8k context. Microsoft ships Phi-3 demonstrating small-model strength via curated data. AI2 ships OLMo with full data and training-code disclosure. The 'open weights' field stratifies into permissive-but-closed-data, permissive-with-AUP, and fully-open-everything tiers.
2024-07
Llama 3.1 · 405B
Meta releases Llama 3.1 8B/70B/405B with 128k context. The 405B variant is the largest openly-released dense model to date and shifts the conversation about what 'frontier-class open weights' looks like.
2024-12
DeepSeek-V3 · R1 (early 2025)
DeepSeek releases V3 in December 2024, followed by R1 in January 2025 under MIT license. R1 is the first widely-distributed open-weights reasoning model and the release shifts both the cost-curve and the political conversation around open-weights releases.
2025-2026
Continued releases — best-effort tracking
Qwen, Mistral, Meta, DeepSeek, IBM, Microsoft, AI2, and 01.AI continue iterating. Llama 4 timing and Llama 3.3 70B (released late 2024) are the moving pieces — check the provider model cards for current state. This page reflects what is verifiably released as of mid-2026 best-effort.

Choosing a base — a short decision tree

You need maximum permissive license (Apache 2.0 or MIT) and English-heavy use: Mistral 7B, Mixtral 8x7B, Phi-3.5, Granite 3, OLMo 2, Qwen 2.5 (most sizes).
You need maximum permissive license and multilingual coverage (incl. Chinese): Qwen 2.5 (Apache 2.0 sizes) or Yi 1.5.
You need 128k context and you are below 700M MAU: Llama 3.1 8B / 70B / 405B or Llama 3.3 70B.
You need on-device inference under 4B parameters: Llama 3.2 1B/3B, Phi-3.5-mini, Gemma 2 2B, Qwen 2.5 0.5B / 1.5B / 3B.
You need open reasoning model with chain-of-thought: DeepSeek-R1 (MIT-licensed weights) — distill to a smaller dense model for production.
You need code-specific: Qwen 2.5-Coder, DeepSeek-Coder-V2, or Codestral (research-only — buy a Mistral license for prod).
You need full reproducibility (data + code + weights): OLMo 2 or Pythia. Nothing else in the list publishes the data mix at OLMo's level.
You are a >700M MAU company and want a frontier open base: Apache-2.0 path (Mixtral 8x22B, Qwen 2.5 72B under Qwen License with commercial terms, or DeepSeek-V3 under its own commercial-permissive license) or negotiate directly with Meta.

Numbers we deliberately did not invent

We did not publish specific MMLU, GPQA, or HumanEval scores in this index. Those numbers move with each fine-tune, each eval harness version, and each prompt-formatting choice — comparing 'official' numbers across providers using different harnesses is misleading. For current benchmark performance, consult: the model card on Hugging Face, the provider's release post, and an independent harness run (lm-evaluation-harness from EleutherAI is the most-cited). The official Hugging Face Open LLM Leaderboard was retired in mid-2024; community-maintained successors exist but rotate. Treat any single leaderboard as one data point, not the verdict.

Honest limits of this page

This page is a snapshot. Open-weights releases happen monthly. By the time you read this, at least one of these families will have shipped a new variant, one license may have been updated (Meta has updated the Llama Community License between 2 and 3 and 3.2), and at least one provider may have switched between research-only and commercial-permissive (or the reverse). The structural claims — what 'open weights' means versus 'open source,' what license tiers exist, why distillation clauses matter — those are durable. The specific row contents are best-effort as of mid-2026 and you should verify each license against the LICENSE file in the model's actual repo before you build on it. If a fact on this page contradicts a provider's official model card, the official model card wins. If a license clause on this page contradicts the LICENSE file in the repo, the LICENSE file in the repo wins. If we missed a model family that should be on this list, that is on us — the bar is publicly-downloadable weights with a real license, and we tried to cover the families that show up most in production deployments and academic citation as of the snapshot date.

Sources

[01]
Meta Llama 3.1 release post documents the 8B / 70B / 405B sizes and 128k context window.
ai.meta.com/blog/meta-llama-3-1/
[02]
The Llama 3.1 Community License contains the 700 million monthly active users threshold for separate commercial licensing.
llama.meta.com/llama3_1/license/
[03]
Meta Llama 3.2 release post documents the 1B / 3B text and 11B / 90B vision variants and the EU restriction for multimodal use.
ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/
[04]
Mistral AI announces Mistral Large 2 at 123B parameters with 128k context under the Mistral Research License.
mistral.ai/news/mistral-large-2407/
[05]
Mistral AI announces Mixtral 8x7B under Apache 2.0 license as the first widely-adopted open MoE.
mistral.ai/news/mixtral-of-experts/
[06]
Mistral AI ships Codestral 22B under the Mistral Non-Production License restricting commercial deployment.
mistral.ai/news/codestral/
[07]
DeepSeek-V3 technical report documents the 671B-parameter MoE architecture with roughly 37B activated parameters per token.
arxiv.org/abs/2412.19437
[08]
DeepSeek-R1 paper documents the open reasoning model release under MIT license.
arxiv.org/abs/2501.12948
[09]
Alibaba's Qwen 2.5 release post documents the 0.5B through 72B size range and the Apache 2.0 license for most variants.
qwenlm.github.io/blog/qwen2.5/
[10]
Gemma Terms of Use document the custom permissive license with prohibited-use clauses for Google's Gemma models.
ai.google.dev/gemma/terms
[11]
Microsoft Phi-3 technical report documents the 3.8B / 7B / 14B Phi-3 family release under MIT license.
arxiv.org/abs/2404.14219
[12]
Microsoft Phi-4 technical report documents the 14B dense reasoning-tuned model.
arxiv.org/abs/2412.08905
[13]
IBM Granite 3 model family is released under Apache 2.0 with enterprise-targeted licensing terms.
ibm.com/granite
[14]
01.AI's Yi paper documents the 6B / 9B / 34B Yi family and bilingual training corpus.
arxiv.org/abs/2403.04652
[15]
TII's Falcon 2 release documents the 11B model under the TII Falcon License 2.0.
falconllm.tii.ae/falcon-2.html
[16]
Stability AI's Stable LM 2 release documents the 1.6B and 12B variants under the Stability AI Community License.
stability.ai/news/introducing-stable-lm-2
[17]
AI2 OLMo releases publish weights, training data (Dolma), training code, and recipe under Apache 2.0 for full reproducibility.
allenai.org/olmo
[18]
EleutherAI Pythia paper documents the 70M through 12B suite released with intermediate checkpoints for interpretability research under Apache 2.0.
arxiv.org/abs/2304.01373
[19]
BLOOM paper documents the 176B multilingual model released under the BigScience RAIL License v1.0.
arxiv.org/abs/2211.05100
[20]
Hugging Face documents the retirement of the original Open LLM Leaderboard.
huggingface.co/blog/open-llm-leaderboard-archive
[21]
EleutherAI's lm-evaluation-harness is the most widely-cited independent benchmark harness for open-weights LLM evaluation.
github.com/EleutherAI/lm-evaluation-harness
[22]
The Open Source Definition (OSI) describes the conditions for software to be considered open-source, which most current open-weights model licenses do not meet.
opensource.org/osd

Keep reading

Learn — language models →Learn — fine-tuning playbook →Research index →OrangeBox — local AI runtime →B00KMakor — books on AI →vs · open weights vs closed APIs →Tools — model picker →

The open-weights model index

The index — open-weights families worth knowing

What "open weights" actually means

License gotchas worth flagging

Frontier open-weights families — the four to actually pay attention to

Llama (Meta)

DeepSeek (DeepSeek-AI)

Qwen (Alibaba)

Mistral (Mistral AI)

Small-model picks — under 14B parameters

How we got here — a short timeline

BLOOM (BigScience)

LLaMA 1 (Meta · research-only leak)

Llama 2

Mistral 7B

Mixtral 8x7B

Llama 3 · Phi-3 · OLMo

Llama 3.1 · 405B

DeepSeek-V3 · R1 (early 2025)

Continued releases — best-effort tracking

Choosing a base — a short decision tree

Numbers we deliberately did not invent

Honest limits of this page

Sources

Keep reading