| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by karel-3d 314 days ago
	I... don't understand how AI related to video codecs. Maybe because I don't understand either video codecs or AI on a deeper level.

4 comments

tdullien 314 days ago

Every predictor is a compressor, every compressor is a predictor.

If you're interested in this, it's a good idea reading about the Hutter prize (https://en.wikipedia.org/wiki/Hutter_Prize) and going from there.

In general, lossless compression works by predicting the next (letter/token/frame) and then encoding the difference from the prediction in the data stream succinctly. The better you predict, the less you need to encode, the better you compress.

The flip side of this is that all fields of compression have a lot to gain from progress in AI.

link

rahimnathwani 314 days ago

Also check out this contest: https://www.mattmahoney.net/dc/text.html

Fabrice Bellard's nncp (mentioned in a different comment) leads.

link

jl6 314 days ago

It has long been recognised that the state of the art in data compression has much in common with the state of the art in AI, for example:

http://prize.hutter1.net/

https://bellard.org/nncp/

link

ddtaylor 314 days ago

Some view these as so interconnected that they will say LLMs are "just" compression.

link

pjc50 314 days ago

Which is an interesting view when applied to the IP. I think it's relatively uncontroversial that an MP4 file which "predicts" a Disney movie which it was "trained on" is a derived work. Suppose you have an LLM which was trained on a fairly small set of movies and you could produce any one on demand; would that be treated as a derived work?

If you have a predictor/compressor LLM which was trained on all the movies in the world, would that not also be infringement?

link

mr_toad 314 days ago

MP4s are compressed data, not a compression algorithm. An MP4 (or any compressed data) is not a “prediction”, it is the difference between what was predicted and what you’re trying to compress.

An LLM is (or can be used) as a compression algorithm, but it is not compressed data. It is possible to have an overfit algorithm exactly predict (or reproduce) an output, but it’s not possible for one to reproduce all the outputs due to the pigeonhole principle.

To reiterate - LLMs are not compressed data.

link

bjoli 314 days ago

It is like upscaling. If you could train AI to "upscale" your audio or video you could get away with sending a lot less data. It is already being done with quite amazing results for audio.

link

Retr0id 314 days ago

AI and data compression are the same problem, rephrased.

link

oblio 314 days ago

Which makes Silicon Valley, the TV show, even funnier.

link

chisleu 314 days ago

holy shit it does. The scene with him inventing the new compression algorithm basically foreshadowed the gooning to follow local LLM availability.

link