Notes

29 January 2026

Frontier AI safety regulations: A reference for lab staff

A summary of key provisions from California's SB 53, the EU Code of Practice, and New York's RAISE Act covering frontier AI developers.

22 January 2026

Clarifying limitations of time horizon

Thomas Kwa responds to some misinterpretations of our time horizon work, and explains limitations and the core finding.

3 October 2025

Early Results on Monitorability in QA Settings

Research on how AI agents can hide secondary task-solving from monitors, finding that harder tasks are more detectable and small models can learn to evade larger monitors.

22 August 2025

Claude, GPT, and Gemini All Struggle to Evade Monitors

A replication of a Google DeepMind paper on chain-of-thought monitoring, showing evidence that monitoring works on other companies' models.

Rough/unpolished research updates and speculation