Windows Agent Arena: Evaluating Multi-Modal OS Agents at ScaleLarge language models (LLMs) show remarkable potential to act as computer agents, enhancing human productivity and software accessibility in multi-modal tasks that require planning and reasoning....https://arxiv.org/abs/2409.08264