PlanBench: An Extensible Benchmark for Evaluating Large Language...Generating plans of action, and reasoning about change have long been considered a core competence of intelligent agents. It is thus no surprise that evaluating the planning and reasoning...https://arxiv.org/abs/2206.10498