| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bconsta 253 days ago
	There is a study that gives a rule of thumb of ~2 bits per param for a model's memorization capacity: https://arxiv.org/abs/2404.05405

3 comments

dart_pink 253 days ago

Seems they have replicated Gardner's work, without mentioning it, "Maximum Storage Capacity in Neural Networks" (1987), which established that the storage capacity of a neural network is about 2N (2 bits per parameter)

link

selimthegrim 253 days ago

Elizabeth Gardner for those looking.

link

bconsta 253 days ago

I had no idea about this. Thanks for sharing

link

adsharma 253 days ago

Recent: 3.6 bits per param

https://arxiv.org/abs/2505.24832

link

dart_pink 253 days ago

You're both right. The classical capacity measure (Gardner's capacity limit) is defined as the maximum number of patterns that can be remembered with zero errors. This remains 2 bits per parameter, proven mathematically.

The capacity definition in this recent paper is completely different - it is defined based on the kolmogorov complexity of predicting a memorized sequence, or in layman's terms: how easy it is to compress known sequences. This allows for some bit "errors", ie some symbols with bad compression ratio, only the total compression ratio of the sequence is measured.

This is somewhat parallel to the classical ECC limits (strict hamming distance constraints) vs modern probabilistic ECC limits.

TLDR when you allow a small number of errors, the capacity increases from 2 bits to 3.6 bits

link

adsharma 253 days ago

2 bits out of FP8 would be 25% 2 bits out of FP16 would be 12.5%

I've seen recent work that claimed 70% of the params are used for memorization.

link