Screen Parsing (a simplest eye of ai agent)
OmniParser V2
OmniParser V2: Turning Any LLM into a Computer Use Agent - Microsoft Research
Yadong Lu, Senior Researcher; Thomas Dhome-Casanova, Software Engineer; Jianwei Yang, Principal Researcher; Ahmed Awadallah, Partner Research Manager Graphic User interface (GUI) automation requires agents with the ability to understand and interact with user screens. However, using general purpose LLM models to serve as GUI agents faces several challenges: 1) reliably identifying interactable icons within the […]
https://www.microsoft.com/en-us/research/articles/omniparser-v2-turning-any-llm-into-a-computer-use-agent/

paper
TWITTER BANNER TITLE META TAG
TWITTER BANNER DESCRIPTION META TAG
https://microsoft.github.io/OmniParser/
model
microsoft/OmniParser · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/microsoft/OmniParser
web demo
What will you build?
The next step of your coding journey starts here.
https://scrimba.com/s08johf0et/head


Seonglae Cho