Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/AI Agent/AI Agent Benchmark/
Tau Bench
Search

Tau Bench

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Feb 27 14:52
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2026 Feb 12 11:48
Refs
Refs
tau-bench
sierra-research • Updated 2026 Feb 19 3:31
 
 
 
 
 
arxiv.org
https://arxiv.org/abs/2406.12045

-bench

Evaluates a dual-control environment where both agents and users use tools to modify shared world state
{$\tau$}-bench: A Benchmark for...
Existing benchmarks for language agents do not set them up to interact with human users or follow domain-specific rules, both of which are vital to safe and realistic deployment. We propose...
{$\tau$}-bench: A Benchmark for...
https://openreview.net/forum?id=roNSXZpUDN
 

Backlinks

Reasoning Model

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/AI Agent/AI Agent Benchmark/
Tau Bench
Copyright Seonglae Cho