Y
Hacker News
new
|
ask
|
show
|
jobs
by
withinrafael
42 days ago
I've had lots of success with generating coordinates and answering questions using the UI-TARS model
https://github.com/bytedance/UI-TARS
.
1 comments
theturtletalks
42 days ago
I’d also checkout midscene, you can set the model and UI-TARS works but you can also use qwen vision models and it works.
link