The research, conducted by the non-profit Model Evaluation & Threat Research (METR), set out to measure the real-world impact of advanced AI tools on software development. Over...
I think these are interesting findings, they definitely conform to my experience, but I’d want to always have more data and larger sample sizes.
One point in particular stood out to me - the lack of LLM context. I have a new person on my team (junior level) that uses AI for everything, and it’s so obvious that the LLM is getting tripped up with the lack of context. Not all the required information is in a single repository, and it needs additional info like the documentation and spoken architectures that aren’t explicitly documented, historical choices as to why we make decisions in a certain way, or even just style guides. The LLM context window just isn’t large enough right now to be a truly effective programmer for large, complex projects
Copying from the x-post
Research paper here: https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf
I think these are interesting findings, they definitely conform to my experience, but I’d want to always have more data and larger sample sizes.
One point in particular stood out to me - the lack of LLM context. I have a new person on my team (junior level) that uses AI for everything, and it’s so obvious that the LLM is getting tripped up with the lack of context. Not all the required information is in a single repository, and it needs additional info like the documentation and spoken architectures that aren’t explicitly documented, historical choices as to why we make decisions in a certain way, or even just style guides. The LLM context window just isn’t large enough right now to be a truly effective programmer for large, complex projects