The Safety Desk
SAFETY
Interpretability, red-teaming, alignment, and the slow work of figuring out what these systems actually do.
Latest from the Safety desk § SAF
Top of the desk · Safety
Anthropic moves Project Glasswing
into public beta with
Claude Security
Announced at Code w/ Claude on May 6, the expansion brings Claude into adversarial cyber workflows for eligible security teams and introduces new cyber verification tooling.