METR: Measuring AI Ability to Complete Long Tasks — LessWrong
Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently e…
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks