Y
Hacker News
new
|
ask
|
show
|
jobs
by
adamsiem
399 days ago
Anyone using vision to parse screenshots? QVQ was too slow. Will give this a shot.
2 comments
logankeenan
399 days ago
I used molmo to parse screenshots in order to detect locations of UI elements. See the repo below. I think Omni parser from Microsoft would also work well.
https://github.com/logankeenan/george
https://github.com/microsoft/OmniParser
link
abrichr
399 days ago
You might be interested in
https://github.com/OpenAdaptAI/OpenAdapt
link
https://github.com/logankeenan/george
https://github.com/microsoft/OmniParser