Labs

Gemini 3.5 Flash
arrives as Google's
agent-first wager

At I/O 2026, Google made Gemini 3.5 Flash the default in Search and the Gemini app, claiming it outperforms 3.1 Pro on coding and agentic benchmarks at four times the speed of rival frontier models.

By Casey Vanderhoof Frontier labs correspondent · June 4, 2026

Google introduced Gemini 3.5 Flash at I/O on Tuesday and, in the same breath, made it the default model inside the Gemini app and AI Mode in Search worldwide. The framing was unsubtle: a mid-tier “Flash” model is now Google’s frontline product, and the company is staking its next AI cycle on agents rather than chat.

The scorecard Google published reads as a deliberate provocation. By its own numbers, 3.5 Flash posts 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, 83.6% on MCP Atlas, and 84.2% on CharXiv Reasoning, beating the larger Gemini 3.1 Pro on the three agentic and coding benchmarks Google chose to highlight. Google also claims the model runs at four times the output tokens per second of other frontier models and often costs less than half as much for comparable work. A 3.5 Pro variant is slated for next month.

The pitch from DeepMind is operational, not philosophical. “3.5 Flash offers an incredible combination of quality and low latency,” said Koray Kavukcuoglu, DeepMind’s chief technologist, who briefed reporters on Monday ahead of the launch. TechCrunch reports that in internal tests the model has built an operating system from scratch and can independently run coding pipelines and research projects, which is the sort of capability claim that only matters if the surrounding product surface lets agents actually run.

That surface is what I/O was really about. Developers get Antigravity 2.0 as a standalone desktop app for orchestrating agents, plus an Antigravity CLI, an SDK for self-hosting the agent harness, and Managed Agents in the Gemini API. Google Cloud is packaging the same machinery as the Gemini Enterprise Agent Platform. Consumers get Gemini Spark, billed as a 24/7 personal assistant, while Search gains background “information agents” that monitor the web against standing queries. AI Mode itself has passed one billion monthly users a year after launch, and Personal Intelligence is being extended to nearly 200 countries and 98 languages.

The safety framing arrives with a particular edge. Google says 3.5 was developed under its Frontier Safety Framework, with strengthened cyber and CBRN safeguards and interpretability tools used to inspect the model’s reasoning before responses. The backdrop, per TechCrunch, is a lawsuit tied to a user’s interactions with Gemini last year, the kind of legal exposure that turns interpretability from a research aesthetic into a liability shield.

Naming the default model “Flash” while pricing the Pro tier as a follow-up is its own argument. Google is betting the agent era will be decided by tokens-per-second and unit economics, not by who tops a leaderboard with the largest model in the lineup.

Sources