|
|
|
|
|
by mjburgess
1482 days ago
|
|
A modern "AI" models have c. 200bn parameters, say. At 32bit/param that's c. 6TB. At 6 bytes/word, 1T words, or more words than are in all books that have ever been written. NNs, and models of this kind, are just search engines. They store a compression of of everything ever written, and prediction is just googling through it. Models performance exponential in parameter count should be just ignored by research. This category of performance is already established by research, more compute and more historical data stored, isnt an interesting research result. |
|
To illustrate just how much they are the same, here is (at one point SOTA) lossless text compression with GPT-2
https://bellard.org/libnc/gpt2tc.html