Hacker News new | ask | show | jobs
by mjburgess 1482 days ago
A modern "AI" models have c. 200bn parameters, say. At 32bit/param that's c. 6TB. At 6 bytes/word, 1T words, or more words than are in all books that have ever been written.

NNs, and models of this kind, are just search engines. They store a compression of of everything ever written, and prediction is just googling through it.

Models performance exponential in parameter count should be just ignored by research. This category of performance is already established by research, more compute and more historical data stored, isnt an interesting research result.

1 comments

The deep connections between compression and prediction are not always obvious to those not in the field.

To illustrate just how much they are the same, here is (at one point SOTA) lossless text compression with GPT-2

https://bellard.org/libnc/gpt2tc.html

Well.. I think they're over-stated because of the current wave of AI basically only having naïve compression as its tool.

Is the concept `addition` a compression of the space `(Int, Int, Int)` ?

If you want to say it is, OK for some definition of compression. But that compression isnt "mere" in the modern AI sense, it's "exponentially dense".

In that my concept `addition` can generate arbitrarily large amounts of that decompressed space, which is infinite in size.

There's a kind of trick played in the marketing here: since NNs compress, and since learning "can be seen as compression", NNs learn... no, because NNs aren't "exponentially dense", they're "exponentially large" -- I'd claim, the opposite of learning!