Hacker News new | ask | show | jobs
by floobertoober 1109 days ago
> Of course, large language models are (by definition) currently going the other direction ...

How so? Aren't the networks' weights orders of magnitude smaller than the training data?

3 comments

I interpreted that statement as saying the current practice is to make LLMs larger and larger (so they effectively memorize more and more data) to make them more powerful, but from the perspective of information theory, if models were powerful and "understanding", then models could stay the same size and become more and more powerful as they get increasingly better at compressing the available information. I am not sure if this interpretation was what was meant though.
I believe the parent poster's point is: LLMs are more effective when they use more memory, meaning the less they are forced to compress the training data, the better they perform.
But they don't losslessly recreate the training data.