| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gloosx 171 days ago

>Create a test for intelligence that we can pass better than AI

Easy? The best LLMs score 40% on Butter-Bench [1], while the mean human score is 95%. LLMs struggled the most with multi-step spatial planning and social understanding.

[1] https://arxiv.org/pdf/2510.21860v1

1 comments

cortic 171 days ago

That is really interesting; Though i suspect its just a effect of differing training data, humans are to a larger degree trained on spacial data, while LLMs are trained to a larger degree on raw information and text.

Still it may be lasting limitation if robotics don't catch up to AI anytime soon.

Don't know what to make of the Safety Risks test, threatening to power down AI in order to manipulate it, and most act like we would and comply. fascinating.

link

gloosx 167 days ago

>humans are to a larger degree trained on spacial data

you must be completely LLMheaded to say something like that, lol

humans are not trained on spacial data, they are living in the world. humans are very much diffent from silicone chips, and human learning is on another magnitude of complexity compared to a large language model training

link

cortic 160 days ago

Humans are large language models. Maybe the term language is being used a bit liberally here but we basically function in the same way, with the exception of the spacial aspect of our training data.

If this hurts your ego then just know the dataset that you built your ego with was probably flawed and if you can put that LoRA aside and try to process this logically; Our awareness is a scalable emergent property of 1-2 decades of datasets, looking at how neurons vs transistor groups work, there could only be a limited amount of ways to process these sizes of data down to relevant streams. The very fact that training LLMs on our output works, proves our output is a product of LLMs or there wouldn't be patterns to find.

link