OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
https://os-world.github.io/
Pocketmon Red
Claude's extended thinking
Discussing Claude's new thought process
https://www.anthropic.com/research/visible-extended-thinking


Seonglae Cho