Hacker News new | ask | show | jobs
by matt123456789 84 days ago
Such low dimensionality of the LoRA vector must surely result in a close-to-linear modification to the KV calculation. This seems to me to imply that what we call "reasoning" is latent within the model. Pretty clear I didn't read the paper, I'm sure the authors address this.
1 comments

Yes - some degree of reasoning appears to be latent in the structure of language itself. But models trained explicitly on reasoning-focused data still perform better than models trained only on general corpora.*

*At least up to 300B parameters, based on the models we’ve tested.

I wonder what the relationships between the grammar of a language, what it can compute, how it encodes, and what the minimal parameters/structure for reasoning looks like...
natural language may provide part of the scaffolding for reasoning, but the capability itself seems to depend more on learned transformations over internal representations than on language alone

refs: https://arxiv.org/abs/2412.17819 https://arxiv.org/abs/2412.06769