|
|
|
|
|
by gloosx
171 days ago
|
|
>Create a test for intelligence that we can pass better than AI Easy? The best LLMs score 40% on Butter-Bench [1],
while the mean human score is 95%. LLMs struggled the most with multi-step
spatial planning and social understanding. [1] https://arxiv.org/pdf/2510.21860v1 |
|
Still it may be lasting limitation if robotics don't catch up to AI anytime soon.
Don't know what to make of the Safety Risks test, threatening to power down AI in order to manipulate it, and most act like we would and comply. fascinating.