I interpreted that statement as saying the current practice is to make LLMs larger and larger (so they effectively memorize more and more data) to make them more powerful, but from the perspective of information theory, if models were powerful and "understanding", then models could stay the same size and become more and more powerful as they get increasingly better at compressing the available information. I am not sure if this interpretation was what was meant though.
I believe the parent poster's point is: LLMs are more effective when they use more memory, meaning the less they are forced to compress the training data, the better they perform.