| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rthnbgrredf 500 days ago
	I'm still not convinced that this isn't a tokenizer issue. Were you able to find a substantial number of questions that do not fall into the letter countinh or word shuffling domsin - problems that are clearly unrelated to the fundamental tokenizer issue of modern LLMs? Otherwise, I would argue that your paper simply proves that the issue still exists.

1 comments

enum 499 days ago

It’s not that the benchmark is hard, but that the reasoning models do so much better than the non-reasoning models. That suggests it is testing a capability that reasoning models have that non-reasoning models do not.

Getting to 100% may require tokenization innovation, sure.

link