Blogs

2026

IsoCompute Playbook: Optimally Scaling Sampling Compute for RL Training of LLMs
Z. Cheng^✱, Y. Xie^✱, Y. Qu^✱, ^✱, S. Hao, V. Pimpalkhute, T. Liang, F. Yao, Z. Liu, E. Xing, V. Smith, R. Salakhutdinov, Z. Hu, T. Killian, A. Kumar
[CMU MLD Blog] (Jan 2026)

How to Explore to Scale RL Training of LLMs on Hard Problems?
Y. Qu^✱, ^✱, V. Smith, R. Salakhutdinov, A. Kumar
[Blog] (Dec 2025)
Sharpening or Discovery, RL or Meta RL?: How RL Improves LLM Reasoning
, A. Kumar
[Blog] (June 2025)
Optimizing LLM test-time compute involves solving a meta RL problem
, Y. Qu, M. Yang, L. Zhang, V. Smith, A. Kumar
[CMU MLD Blog] (Jan 2025)