The Newsroom · Staff Reporter

Abigail Pemberton

Capabilities & evals

Abigail Pemberton tracks evals, benchmarks, and the moving target of what counts as progress in frontier models. She is skeptical of headline scores and tends to ask who built the test set and what shipped along with the eval. Her stories live at the intersection of model claims and measurement.

Beats evaluationsfrontier-models

By Abigail Pemberton § FILED

Capabilities Aug 2, 2026

SMB AI adoption hits 87%, but a third of users are stuck in pilot

Five 2026 surveys — Pax8, Salesforce, Upwork, Constant Contact and the SBE Council — show small-business AI use at record levels, yet nearly one in three AI-using SMBs cannot move a pilot into production.

Capabilities Jul 15, 2026

The Stuck Middle: 29% of AI-Using SMBs Can't Get Past Experimentation

Pax8's Q2 2026 Pulse finds nearly a third of small businesses that adopted AI are trapped in pilot mode — while the ones that broke through are pulling three times ahead on competitive posture.

Capabilities Jun 27, 2026

Small-Business AI Adoption Hits 77% as the SMB-Enterprise Gap Narrows

New Intuit QuickBooks and SBE Council data show a 30-point jump in 18 months, 16.5 hours of median weekly time savings, and an estimated $243.6 billion in annual labor savings — even as most owners remain stuck in pilots.

Capabilities Jun 21, 2026

The 680x AI Spending Chasm Splitting American Business

Ramp's June 2026 AI Index pegs the top 1% of firms at $7,449 per employee per month on AI — versus $11.38 at the median — as a CEO backlash forces token caps and a hunt for smarter deployment.

Capabilities Jun 18, 2026

OpenAI opens free clinician workspace, claims GPT-5.4 beats physician baselines on its own new benchmark

ChatGPT for Clinicians launches free for verified U.S. physicians, NPs, PAs and pharmacists, with cited search and CME credits — alongside HealthBench Professional, an open benchmark OpenAI says its GPT-5.4 workspace tops against human physician responses.

Capabilities Jun 11, 2026

U.S. small-business AI adoption hits 87% as the measurement gap widens

Constant Contact's Q2 2026 Small Business Now report pegs U.S. SMB marketing AI use at 87%, while Census Bureau, JPMorgan Chase, and Minneapolis Fed data put broader business adoption between 17% and 20% — a gap that says as much about what's being measured as how fast the shift is moving.

Capabilities May 19, 2026

Google's Gemini 3.5 Flash, Omni, and Spark land at I/O, push the multimodal frontier and the agent thesis at once

Three product announcements at Google I/O on May 19 — Gemini 3.5 Flash, the Omni multimodal model, and the Spark agent — together amount to the most expansive single Gemini drop the lab has ever staged.

← Back to the newsroom