Hacker News new | ask | show | jobs
by esafak 102 days ago
https://huggingface.co/spaces/osunlp/Online_Mind2Web_Leaderb...
1 comments

Hm I can't see Opus 4.6 on there
I tweeted at the OSUNLP and they're backed up on eval validation. In the meantime, here's the benchmark repo with the saved runs and also instructions on how to run it locally. https://github.com/theredsix/abp-online-mind2web-results