Table of Contents
Thinking model is smarter but slow. For speed, that’s where non-thinking model shines. Google announces that in Gemini 2.5, both Flash and Pro will be thinking model while only Flash-Lite would be a non-thinking model. Let’s see which one is the better choice in terms of performance benchmarks and pricing.
1. Performance Benchmarks (Non-Thinking Model)
Capability | Benchmark | Gemini 2.5 Flash-Lite (Non-Thinking) | Gemini 2.0 Flash | Winner |
---|---|---|---|---|
General Reasoning | MMLU-Pro | 71.6% | 77.6% | 2.0 Flash |
Scientific QA | GPQA Diamond | 64.6% | 60.1% | 2.5 Lite |
Math | AIME 2025 | 49.8% | 63.5%(HiddenMath) | 2.0 Flash |
Code (Python) | LiveCodeBench | 33.7% | 34.5% | 2.0 Flash |
Code Editing | Aider Polyglot | 26.7% | ~25% (est.) | 2.5 Lite |
SWE-bench (Agentic Coding) | SWE Verified | 42.6% | ~34.5% (est.) | 2.5 Lite |
Factual QA (Simple) | SimpleQA | 10.7% | 29.9% | 2.0 Flash |
Factual QA (Grounded) | FACTS Grounding | 84.1% | 84.6% | 2.0 Flash |
Multilingual QA | Global MMLU (Lite) | 81.1% | 83.4% | 2.0 Flash |
Image Reasoning | MMMU | 72.9% | 71.7% | 2.5 Lite |
Long-Context Memory | MRCR (1M) | 4.1% | 70.5% | 2.0 Flash |
Sources: Gemini 2.0 Benchmark, Gemini 2.5 Benchmark
- Gemini 2.0 Flash wins 7 out of 11 benchmarks, excelling in general reasoning, math, Python coding, factual QA (simple and grounded), multilingual understanding, and long-context memory.
- Gemini 2.5 Flash-Lite wins 4 out of 11 benchmarks, leading in scientific QA, code editing, agentic coding, and image reasoning.
2. Pricing
Model | Input (1M tokens) | Output (1M tokens) |
Gemini 2.0 Flash | $0.15 | $0.60 |
Gemini 2.5 Flash-Lite | $0.10 | $0.40 |
Source: Gemini Pricing
2.5 Flash-Lite is ~33% cheaper than 2.0 Flash, which is good for high-volume users who do not need long-context.
3. Conclusion
- 2.5 Flash-Lite is the cheaper and best for short-form, single-shot tasks.
- 2.0 Flash remains the most balanced non-thinking model for comprehensive performance across a variety of domains.