|
|
|
|
|
by gcr
1067 days ago
|
|
How is it similar? There’s been a lot of study going the other direction - using neural networks to aid the entropy prediction in classical compression algorithms - but I’m not seeing the conceptual link between how transformer/attention models work internally and how gzip works internally beyond “similar words are easy to compress” I’m not seeing it because GPT representations are just vectors of fixed, not varying, size |
|
One way to measure the accuracy of the model, as in it's "intelligence", is to use the predictions to turn input into all the differences from the prediction; if it's good at predicting then there will be fewer differences and it will compress it.
So seeing how well your model can compress some really big chunk of text is a very good objective measure of it's strength and compare it to the strength of others?
So a competition is born! :)