S · supermodel
Gemini 3.1 Pro Preview · Thinking-High
House of Google
Hardest-eval champion. Cheapest at the top.
The lab's verdict
Gemini 3.1 Pro Preview on Thinking-High is the only model to clear 45% on Humanity's Last Exam at this cutoff (46.44%). It is also the cheapest frontier model on the leaderboard at $1.74/M tokens and runs at 138 tokens/sec. There is no other model where the math is this clean.
Real-user sentiment, filtered
Researchers running their own brutal evals report it as the model that 'actually thinks' instead of pattern-matching. Critics note its safety filters still over-fire on technical prompts, and it has the worst voice-personality of the top three — it sounds like a textbook. Nobody who has tested it on reasoning rates it second.
Best for
Hard math · physics derivations · long-context retrieval · cost-sensitive reasoning
Where it loses
Open-ended writing voice · empathetic tone · creative latitude under safety filters
Receipts
Cutoff 2026-06-03