| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by whazor 521 days ago
	You could consider a LLM as a very lossy compression artifact. Where they took terabytes of input data, and ended up with model under the 100 gigabytes. It is quite remarkable what such a model can do, even fabricating new output that was not in the input data. However, in my naïvety, I wonder whether vastly simpler algorithms could be used to end up with similar results. Regular compression techniques work with speeds up to 700MB/s.

5 comments

red75prime 521 days ago

The remarkable thing about this compression method is that stochastic gradient descent for some reason creates algorithms in the network. Not Turing-complete algorithms, of course, but algorithms nevertheless.

An LLM trained on the addition and multiplication data develops circuits for addition and multiplication[1].

It stands to reason that LLM trained on human-produced data develop algorithms that try to approximate the data production process (within their computational limits).

[1] https://arxiv.org/abs/2308.01154

link

whazor 521 days ago

Interesting. I am not sure whether there are any 'normal' compression techniques that actually create algorithms. That might be an interesting approach to normally compress data as well.

link

godelski 521 days ago

  > However, in my naïvety, I wonder whether vastly simpler algorithms could be used to end up with similar results.

Almost certainly. Distillation demonstrates this. The difficulty is training. It's harder to train a smaller network and harder to train with less data. But look at humans, they ingest far less data and certainly less diverse data. We are extremely computationally efficient. I guess you have to be when you run on meat

link

purplethinking 521 days ago

> they ingest far less data

True in terms of text, but not if you include video, audio, touch etc. Sure, one could argue that there is much less information content in video than their raw bytes, but even so, we spend many years building a world model as we play with tools, exist in the world and go to school. I don't deny humans are more efficient learners but people tend to forget this. Also, children are taught things in ascending order of difficulty, while with LLMs we just throw random pieces of text at it. There is sure to be a lot of progress in curriculum learning for AI models.

link

ghxst 521 days ago

I'm not sure how accurate it is but my gut feeling is that the level of meaningful compression is somehow correlated to the level of intelligence behind a model, I wouldn't be surprised if it ends up being a major focus in general intelligence.

link

trash_cat 521 days ago

This is the whole premise behind transformers and ChatGPT models and has been discussed by Ilya[0].

[0] https://the-decoder.com/openai-co-founder-explains-the-secre...

link

briandear 521 days ago

If they could get to a 5.2 Weissman compression score it would probably make a substantial difference.

link

topspin 521 days ago

You could consider the human mind to be a very lossy compression artifact.

link