| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by riku_iki 935 days ago
	and how can you tell they reason and not parrot some text in training data? There are papers about trying LLMs on generated reasoning problems, and they usually fail.

1 comments

nuancebydefault 935 days ago

>Usually

That implies - sometimes not. Which would prove at least some reasoning capabilities.

link

riku_iki 935 days ago

In this case I used 'usually' because don't remember all details and didn't want to generalize by saying 'always', but also training/benchmarking protocol can be flawed, for example LLM still can solve shallow reasoning problem by memorizing pattern.

link