Measuring the Effective Task Horizon of AI

Participants felt that they were 20% faster thanks to AI, but in reality, when using AI tools, they were actually 19% slower. Although the data is limited, this suggests that AI tools don't always dramatically improve productivity.
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
We conduct a randomized controlled trial to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Vivaria
Vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation
research. Vivaria is a web application with which users can interact using a web UI and a
command-line interface.
https://vivaria.metr.org/
METR: Measuring AI Ability to Complete Long Tasks — LessWrong
Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently e…
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks

Seonglae Cho