|
|
|
|
|
by matt123456789
84 days ago
|
|
Such low dimensionality of the LoRA vector must surely result in a close-to-linear modification to the KV calculation. This seems to me to imply that what we call "reasoning" is latent within the model. Pretty clear I didn't read the paper, I'm sure the authors address this. |
|
*At least up to 300B parameters, based on the models we’ve tested.