Y
Hacker News
new
|
ask
|
show
|
jobs
by
logankeenan
396 days ago
I used molmo to parse screenshots in order to detect locations of UI elements. See the repo below. I think Omni parser from Microsoft would also work well.
https://github.com/logankeenan/george
https://github.com/microsoft/OmniParser