Hacker News new | ask | show | jobs
by vladimir_gor 87 days ago
I'm a big fan of benchmarks and now finally we have one to evaluate models on physical tasks. Will be interesting to see how fast this gap will narrow.