|
|
|
|
|
by goombacloud
1066 days ago
|
|
Specially compression algos that use arithmetic coding with interval weights adjusted based on the prediction of what is likely coming next are very similar.
They adjust the arithmetic coding (https://en.wikipedia.org/wiki/Arithmetic_coding) based on the context of the byte/bit to predict, so the more accurate the predicted continuation is, the more efficient is the encoding. The task is very similar to that of the transformers like GPT. A perfect prediction will almost have no additional storage cost because the arithmetic interval doesn't get smaller, and thus no bit gets stored - but anyway, you have to count the size of the decompressor to get a fair benchmark. |
|
There’s been a lot of study going the other direction - using neural networks to aid the entropy prediction in classical compression algorithms - but I’m not seeing the conceptual link between how transformer/attention models work internally and how gzip works internally beyond “similar words are easy to compress”
I’m not seeing it because GPT representations are just vectors of fixed, not varying, size