There’s no end in sight for the current LLM release cycle. Within the last 30 days, we’ve seen the launches of Google’s Gemini 3 Pro, Anthropic’s Opus 4.5, and OpenAI’s GPT-5.2. That’s in addition to...
There’s no end in sight for the current LLM release cycle. Within the last 30 days, we’ve seen the launches of Google’s Gemini 3 Pro, Anthropic’s Opus 4.5, and OpenAI’s GPT-5.2. That’s in addition to models from A2AI, DeepSeek, Grok, Mistral, Nvidia and others. Today it’s Google’s turn again, with the launch of the smaller and faster version of Gemini 3: Gemini 3 Flash.
As we’ve seen with many of the smaller models from Google and other frontier model builders, Gemini 3 Flash isn’t far behind its Pro brethren in terms of capabilities, with Gemini 3 Flash (with its thinking mode on) being close to Gemini 3 Pro, Anthropic’s Sonnet 4.5 and OpenAI’s GPT-5.2 in most benchmarks — and sometimes even beating them. Like its predecessor, it also offers a 1 million token context window.
To put Gemini 3 Flash’s performance into perspective, just a few weeks ago, Flash 3 would’ve been at the top of most of the frontier model benchmarks.
“For too long, AI forced a choice: big models that were slow and expensive, or high-speed models that were less capable. Gemini 3 Flash ends this compromise. Gemini 3 Flash delivers smarts and speed,” Google writes in today’s announcement.
Compared to the last Flash model (Gemini 2.5 Flash), Gemini 3 Flash represents a significant step up, which is especially important for developers, as Flash has long been recognized as the model with the best price-to-performance ratio.
One area where Google has been especially class-leading has been multimodal reasoning, with its models being able to reason over text, images, audio files and video. More recently, the Gemini models have also become quite capable at building their own visualizations on the fly, something Google also highlights for this new model. Indeed, Gemini 3 Flash even beats Gemini 3 Pro in the multimodal MMMU-Pro benchmark, though only by 0.2%,
Another area where Google’s models have recently made some advances is coding. On the SWE-Bench Verified benchmark, Gemini 3 Flash also beats Gemini 3 Pro and is even ahead of Sonnet 4.5 (though GPT-5.2 remains the top performer here).
“Gemini 3 Flash remains the best fit for Warp’s Suggested Code Diffs, where low latency and cost efficiency are hard constraints,” said Zach Lloyd, the founder and CEO of Warp. “With this release, it resolves a broader set of common command-line errors while staying fast and economical. In our internal evaluations, we’ve seen an 8% lift in fix accuracy.”
One trend we’ve recently seen is that even these smaller models have been getting more expensive for developers to use through the API, with Gemini 3 Flash now costing $0.5/$3 per million input/output tokens, up from $0.3/$2.5. That’s still much cheaper than Anthropic’s Claude Sonnet ($3/$5) or even the smaller and less capable Claude Haiku ($1/5) models.
On average, though, Gemini 3 Flash uses 30% fewer tokens to generate its answers when compared to Gemini 2.5 Flash, Google says, all while also being faster. Google only compared this new model to the older 2.5 Pro model, though, where it’s 3x faster.
The new model is now available in the API through Google AI Studio and Vertex AI, as well as in the company’s new AI coding tools Antigravity, Gemini CLI and Android Studio. Google’s partners will also build it into their own tools, of course.
For consumers, Gemini 3 Flash will now power Google Search’s AI Mode (with the Pro model still being an option, too) and the “Fast” and “Thinking” modes in the Gemini app (where the Pro mode will still be available, too).
The post Google’s New Gemini 3 Flash Rivals Frontier Models at a Fraction of the Cost appeared first on The New Stack.