| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by adamsiem 399 days ago
	Anyone using vision to parse screenshots? QVQ was too slow. Will give this a shot.

2 comments

logankeenan 399 days ago

I used molmo to parse screenshots in order to detect locations of UI elements. See the repo below. I think Omni parser from Microsoft would also work well.

https://github.com/logankeenan/george

https://github.com/microsoft/OmniParser

link

abrichr 399 days ago

You might be interested in https://github.com/OpenAdaptAI/OpenAdapt

link