Blogs
2026
- IsoCompute Playbook: Optimally Scaling Sampling Compute for RL Training of LLMs
Z. Cheng✱, Y. Xie✱, Y. Qu✱, ✱, S. Hao, V. Pimpalkhute, T. Liang, F. Yao, Z. Liu, E. Xing, V. Smith, R. Salakhutdinov, Z. Hu, T. Killian, A. Kumar
[CMU MLD Blog] (Jan 2026)
2025
How to Explore to Scale RL Training of LLMs on Hard Problems?
Y. Qu✱, ✱, V. Smith, R. Salakhutdinov, A. Kumar
[Blog] (Dec 2025)Sharpening or Discovery, RL or Meta RL?: How RL Improves LLM Reasoning
, A. Kumar
[Blog] (June 2025)Optimizing LLM test-time compute involves solving a meta RL problem
, Y. Qu, M. Yang, L. Zhang, V. Smith, A. Kumar
[CMU MLD Blog] (Jan 2025)