PlanBench: An Extensible Benchmark for Evaluating Large Language...
Generating plans of action, and reasoning about change have long been considered a core competence of intelligent agents. It is thus no surprise that evaluating the planning and reasoning...
https://arxiv.org/abs/2206.10498