Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/AI Agent/AI Agent Benchmark/
METR
Search

METR

Creator
Creator
Seonglae Cho
Created
Created
2025 Apr 5 12:31
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Apr 27 17:2
Refs
Refs

Measuring the Effective Task Horizon of AI

notion image
 
 
 
 
 
 
 
 
 

Vivaria

Vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research. Vivaria is a web application with which users can interact using a web UI and a command-line interface.
Vivaria
https://vivaria.metr.org/
METR: Measuring AI Ability to Complete Long Tasks — LessWrong
Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently e…
METR: Measuring AI Ability to Complete Long Tasks — LessWrong
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks
METR: Measuring AI Ability to Complete Long Tasks — LessWrong
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/AI Agent/AI Agent Benchmark/
METR
Copyright Seonglae Cho