|
|
|
|
|
by sigmoid10
373 days ago
|
|
At the end of the day, I fully expect large-n Hanoi and all these things to end up as yet another benchmark. Like all the needles-in-haystack or spelling tests that people used to show shortcomings of LLMs and that were actually just technical implementation artefacts and got solved pretty fast by integrating that kind of problem into training. LLMs will always have to use a slightly different approach to reasoning than humans because of these technical aspects, but that doesn't mean that they are fundamentally inferior or something. It only means we can't rely on human training data forever and have to look more towards stuff like RL. |
|