Hacker News new | ask | show | jobs
by jerryliu12 431 days ago
Have you guys done any benchmarking to see which LLMs perform best?
1 comments

No formal benchmarks yet—but just from our own tests, OpenAI's computer use model has generally done a better job than Anthropic's, especially at locating the right click targets and coordinates. We're definitely planning a more thorough comparison soon, though! Curious if anyone else has noticed differences in these computer use models? Would love to swap notes! :)