Hacker News new | ask | show | jobs
by logankeenan 396 days ago
I used molmo to parse screenshots in order to detect locations of UI elements. See the repo below. I think Omni parser from Microsoft would also work well.

https://github.com/logankeenan/george

https://github.com/microsoft/OmniParser