METR

Creator

Creator

Seonglae Cho

Created

Created

2025 Apr 5 12:31

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Aug 1 23:11

Refs

Refs

Measuring the Effective Task Horizon of AI

notion image

Participants felt that they were 20% faster thanks to AI, but in reality, when using AI tools, they were actually 19% slower. Although the data is limited, this suggests that AI tools don't always dramatically improve productivity.

https://arxiv.org/pdf/2507.09089

https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

We conduct a randomized controlled trial to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Vivaria

Vivaria is METR's tool for running evaluations and conducting agent elicitation research. Vivaria is a web application with which users can interact using a web UI and a command-line interface.

https://vivaria.metr.org/

METR: Measuring AI Ability to Complete Long Tasks — LessWrong

Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently e…

METR: Measuring AI Ability to Complete Long Tasks — LessWrong

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks

METR: Measuring AI Ability to Complete Long Tasks — LessWrong

Recommendations

///////