Hacker News new | ask | show | jobs
by iiJDSii 494 days ago
Such as? Are they able to recognize arbitrary GUI elements from various desktop programs, web browsers, etc?
1 comments

Qwen2.5-vl seems to be the best right now by our tests.

UI-TARS by bytedance also has a good amount of pretraining.

Molmo is also very good at coordinates.