Hacker News new | ask | show | jobs
by red75prime 521 days ago
The remarkable thing about this compression method is that stochastic gradient descent for some reason creates algorithms in the network. Not Turing-complete algorithms, of course, but algorithms nevertheless.

An LLM trained on the addition and multiplication data develops circuits for addition and multiplication[1].

It stands to reason that LLM trained on human-produced data develop algorithms that try to approximate the data production process (within their computational limits).

[1] https://arxiv.org/abs/2308.01154

1 comments

Interesting. I am not sure whether there are any 'normal' compression techniques that actually create algorithms. That might be an interesting approach to normally compress data as well.