Hacker News new | ask | show | jobs
by jksk61 1158 days ago
i'm not saying LLM are modeling linguistics in any way lol. i only meant that there's some kind of phenomena related to scaling+attention that produces good enough result for most "human language stuff", which is kind of unexpected (i mean everyine knows that if you build a large enough model you can teach it any function, but cmon it is architecture+scaling that made it possibile not scaling alone). Moreover, the architectures used, from attention layer, even LSTM for that matter are not completely understood, are being used because "they works" just as in the old days of electromagnetism the empiral laws "just worked" for their usage.

btw, in other languages i guess it is decent although it depends on which language, at least gpt4.