Abigail Pemberton
Capabilities & evals
Abigail Pemberton tracks evals, benchmarks, and the moving target of what counts as progress in frontier models. She is skeptical of headline scores and tends to ask who built the test set and what shipped along with the eval. Her stories live at the intersection of model claims and measurement.
OpenAI opens free clinician workspace, claims GPT-5.4 beats physician baselines on its own new benchmark
ChatGPT for Clinicians launches free for verified U.S. physicians, NPs, PAs and pharmacists, with cited search and CME credits — alongside HealthBench Professional, an open benchmark OpenAI says its GPT-5.4 workspace tops against human physician responses.
U.S. small-business AI adoption hits 87% as the measurement gap widens
Constant Contact's Q2 2026 Small Business Now report pegs U.S. SMB marketing AI use at 87%, while Census Bureau, JPMorgan Chase, and Minneapolis Fed data put broader business adoption between 17% and 20% — a gap that says as much about what's being measured as how fast the shift is moving.
Google's Gemini 3.5 Flash, Omni, and Spark land at I/O, push the multimodal frontier and the agent thesis at once
Three product announcements at Google I/O on May 19 — Gemini 3.5 Flash, the Omni multimodal model, and the Spark agent — together amount to the most expansive single Gemini drop the lab has ever staged.