Hacker News new | ask | show | jobs
by jamwise 5 days ago
Reminds me of when I tried to use the library of babel as a data compression tool. It led me down a fun rabbit hole and was my first introduction to information theory.

The conclusion being that you basically need the same amount of data to represent the address of your data as the data itself, so it's not really effective at compression, just a fun thought experiment.

The cool part of this in modern times is that LLMs are basically a form of lossy compression that actually achieves the gist of what these tools fail at. Although it is lossy, and requires a massive substrate. This is related to the idea of AI/LLMs being a form of language compression.

8 comments

You'll find this an interesting watch:

Reinventing Entropy Compression is Intelligence Part 1

3blue1brown https://youtu.be/l6DKRf-fAAM?is=ne73FCJ7ErXhzZ-v

You, and the HN users, `lojban`, `klingon`, `ido`, `brithenig`, `solresol`, `babm`, and `tokipona`, may want to start a club. Amusingly, nobody seems to have registered the `esperanto`, `volapuk`, `interslavic`, `balaibalan`, and `dothraki` usernames.
What can I say other than thank you for the inspiration.
I feel like I am having a stroke reading this comment
The user names all describe conlangs[0]. Though I'd suggest nz to join as well, considering only a true conlang-connisseur would actually notice.

[0]: https://en.wikipedia.org/wiki/Constructed_language

I don’t see users with ‘khuzdul’, ’sindarin’, or ‘quenya’ either.
Also this article by Ted Chiang as a literary explanation of the connection between intelligence and compression: https://www.newyorker.com/tech/annals-of-technology/chatgpt-...
In some sense, science is the most extreme form of compression - Newtonian mechanics explains an incredible number of phenomena in a few lines of text.
It does, but only vaguely unless you already know how it works and can work backwards to Newton's laws. Eg Newtonian mechanics can explain how flying works, but if you don't already know then it's hard to go from Newton's 3 laws to a functional explanation of why planes don't fall out of the sky.

Some of that is also the domain. It's less that science is an extreme form of compression, and more that natural phenomenon are highly compressible. They're a small number of kinds of interactions repeated a bajillion times. How many equations does it take to explain electricity (ignoring equations that are derivatives of ones already included)? I think it's less than 5.

On some level, you could probably reduce all of the Standard Model down to models of atoms, their motion, and the basic subatomic particles (the non-quantum ones). That would explain almost everything that happens on Earth in a very short form, though few people would be able to go from that to explaining how lightning works.

I agree it's an oversimplification. The example I think of is something like Newton's law of gravitation vs Ptolemaic epicycles: one simple explanation replaced many layers of tweaks.

It's also a relevant example for AI - one paper tested the ability of Transformers to model planetary orbits: unlike Newton's Law, the implicit forces they learn are nonsense.

https://arxiv.org/pdf/2507.06952

Yes. But /lossful/ compression: (scientific, philosophical etc.) laws compress an abstract narration of events into that tiny, hard, fundamental, predictive detail.

(Then it depends on your concern: "Aagh, the aunt fell!" // "Oh yes, that'd be Newton")

> "Aagh, the aunt fell!" // "Oh yes, that'd be Newton"

This is totally lost on me.

> This is totally lost on me.

Appears to be lossy then ;)

(Sorry, you have to admit that was too easy to not say)

Compression minimizes the representation of information.

Laws (scientific, philosophical etc.) as compression represent the common side of classes of events - an abstraction of said events, stripping the irrelevant - irrelevant to some perspective, or irrelevant in a potential Procuste's bed. So, laws are compression, but a so extremely lossful compression that the loss can be relevant.

Brutally, "there may be more to the story of the fall of an elderly than just gravitation" - also in the sense that there are details behind the event.

Laws are compression - yes, with caveats.

On a more scientific, epistemological side: Einstein extended Newton covering more exceptions (reducing the abstraction - reducing the loss).

3Blue1Brown just released a viduo about this Intelligence-Compression connection.

https://youtu.be/l6DKRf-fAAM

The idea was fresh in my mind because I watched this yesterday. Great video, the illustrations and intuition-building of the compressability of information was so good! I'm so grateful for 3Blue1Brown.
That conclusion is similar to the concept of 'unconditional security' especially WRT one-time pads. The key must be at least as long as the message itself.

Other forms of encryption are based on assumptions and conditions being true (e.g. factoring is a hard problem, etc.) that may or may not be true. We don't know.

The level of compression is pretty impressive when you think about it. I wrote a comment a while back which is still true (although bytes should be bits, so in that sense it’s still wrong): https://news.ycombinator.com/item?id=39559969

Back of the envelope calculation for storing valid 4-grams (sequences of four words) is around 10 billion x 14 bits per word = 17 gb for all 10 billion. There are LLMs 100x smaller which can write coherent prose.

If you combine the LLM probability distribution with arithmetic coding you can actually use them to compress text losslessly. When people reports 'bits per byte', it is actually the compression rate for text.

GPT-2 for instance achieves roughly 1 bit per byte, so it can be used to compress (english) text 8-fold. Modern models are likely much better.

LLM's seem to be the weird interesting outcome of applying lossy (de)compression concepts to text instead of the audio/image/video domains where they have traditionally been used.
If you set temperature to 0.0 you almost have a key-value store, but finding the right key for your value might take some effort.
https://github.com/philipl/inferencefs/ by the same author in case you missed it
I did miss it, thank you!
> you basically need the same amount of data to represent the address of your data as the data itself

Almost like the other Borges work where “the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire”.