Tau Bench

Creator

Creator

Seonglae Cho

Created

Created

2025 Feb 27 14:52

Editor

Editor

Seonglae Cho

Edited

Edited

2026 Jun 4 14:30

Refs

Refs

sierra-research • Updated 2026 Jun 4 10:32

https://arxiv.org/abs/2406.12045

-bench

Evaluates a dual-control environment where both agents and users use tools to modify shared world state

{$\tau$}-bench: A Benchmark for...

Existing benchmarks for language agents do not set them up to interact with human users or follow domain-specific rules, both of which are vital to safe and realistic deployment. We propose...

${$\tau$}-bench: A Benchmark for...$

https://openreview.net/forum?id=roNSXZpUDN

Recommendations

///////