The Safety Desk

SAFETY

Interpretability, red-teaming, alignment, and the slow work of figuring out what these systems actually do.

Latest from the Safety desk § SAF
Top of the desk · Safety

Anthropic moves Project Glasswing
into public beta with
Claude Security

Announced at Code w/ Claude on May 6, the expansion brings Claude into adversarial cyber workflows for eligible security teams and introduces new cyber verification tooling.