Hacker News new | ask | show | jobs
by adsharma 253 days ago
Recent: 3.6 bits per param

https://arxiv.org/abs/2505.24832

1 comments

You're both right. The classical capacity measure (Gardner's capacity limit) is defined as the maximum number of patterns that can be remembered with zero errors. This remains 2 bits per parameter, proven mathematically.

The capacity definition in this recent paper is completely different - it is defined based on the kolmogorov complexity of predicting a memorized sequence, or in layman's terms: how easy it is to compress known sequences. This allows for some bit "errors", ie some symbols with bad compression ratio, only the total compression ratio of the sequence is measured.

This is somewhat parallel to the classical ECC limits (strict hamming distance constraints) vs modern probabilistic ECC limits.

TLDR when you allow a small number of errors, the capacity increases from 2 bits to 3.6 bits