Abigail Pemberton
The Newsroom · Staff Reporter

Abigail Pemberton

Capabilities & evals

Abigail Pemberton tracks evals, benchmarks, and the moving target of what counts as progress in frontier models. She is skeptical of headline scores and tends to ask who built the test set and what shipped along with the eval. Her stories live at the intersection of model claims and measurement.

evaluationsfrontier-models
By Abigail Pemberton § FILED
Capabilities

OpenAI opens free clinician workspace, claims GPT-5.4 beats physician baselines on its own new benchmark

ChatGPT for Clinicians launches free for verified U.S. physicians, NPs, PAs and pharmacists, with cited search and CME credits — alongside HealthBench Professional, an open benchmark OpenAI says its GPT-5.4 workspace tops against human physician responses.

Capabilities

U.S. small-business AI adoption hits 87% as the measurement gap widens

Constant Contact's Q2 2026 Small Business Now report pegs U.S. SMB marketing AI use at 87%, while Census Bureau, JPMorgan Chase, and Minneapolis Fed data put broader business adoption between 17% and 20% — a gap that says as much about what's being measured as how fast the shift is moving.

Capabilities

Google's Gemini 3.5 Flash, Omni, and Spark land at I/O, push the multimodal frontier and the agent thesis at once

Three product announcements at Google I/O on May 19 — Gemini 3.5 Flash, the Omni multimodal model, and the Spark agent — together amount to the most expansive single Gemini drop the lab has ever staged.

← Back to the newsroom