Hacker News new | ask | show | jobs
by riku_iki 935 days ago
and how can you tell they reason and not parrot some text in training data?

There are papers about trying LLMs on generated reasoning problems, and they usually fail.

1 comments

>Usually

That implies - sometimes not. Which would prove at least some reasoning capabilities.

In this case I used 'usually' because don't remember all details and didn't want to generalize by saying 'always', but also training/benchmarking protocol can be flawed, for example LLM still can solve shallow reasoning problem by memorizing pattern.