|
|
|
|
|
by andy12_
36 days ago
|
|
I disagree. Even frontier models still achieve way worse results than the human baseline in VendingBench. As long as models can't manage optimally something as simple as a vending machine, they have no hope of managing a McDonalds. |
|