OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environmentshttps://os-world.github.io/Pocketmon RedClaude's extended thinkingDiscussing Claude's new thought processhttps://www.anthropic.com/research/visible-extended-thinking