Y
Hacker News
new
|
ask
|
show
|
jobs
by
Retr0id
102 days ago
Hm I can't see Opus 4.6 on there
1 comments
theredsix
102 days ago
I tweeted at the OSUNLP and they're backed up on eval validation. In the meantime, here's the benchmark repo with the saved runs and also instructions on how to run it locally.
https://github.com/theredsix/abp-online-mind2web-results
link