|
|
|
|
|
by famouswaffles
654 days ago
|
|
>the model will learn what it can to minimize the error over the specific provided loss function, and no more. Change the loss function and you change what the model learns. You clearly do not really understand what it means to predict internet scale text with increasing accuracy. No more than that ? Fantastic LLMs do not just learn surface statistics. So many papers have thoroughly disabused this that i'm just not going to bother. This is just straight up denial. This havs been evidently shown in chess as well.
https://arxiv.org/abs/2403.15498v2 You have no idea what you are talkin about. You've probably never even played with 3.5-turbo-instruct. That's how you can say this nonsense. You have your conclusion and keep working backwards to get a justification. >It's interesting that LLMs can reach the ELO level that they do (says more about chess than about LLMs) When you say this for everything LLMs can do then it just becomes a meaningless cope statement. |
|
However, you seem to be engaged in magical thinking and believe these models are learning things beyond their architectural limits. You appear to be star struck by what these models can do, and blind to what one can deduce - and SEE - they they are unable to do.