|
|
|
|
|
by whazor
521 days ago
|
|
You could consider a LLM as a very lossy compression artifact. Where they took terabytes of input data, and ended up with model under the 100 gigabytes. It is quite remarkable what such a model can do, even fabricating new output that was not in the input data. However, in my naïvety, I wonder whether vastly simpler algorithms could be used to end up with similar results. Regular compression techniques work with speeds up to 700MB/s. |
|
An LLM trained on the addition and multiplication data develops circuits for addition and multiplication[1].
It stands to reason that LLM trained on human-produced data develop algorithms that try to approximate the data production process (within their computational limits).
[1] https://arxiv.org/abs/2308.01154