> How much do language models memorize?
— https://arxiv.org/abs/2505.24832
— https://news.ycombinator.com/item?id=44171363
It shows that models are limited in how much they can memorise (~3.6 bits per parameter), and once that threshold is reached, the model starts to generalise instead of memorise.