|
|
|
|
|
by bitpush
105 days ago
|
|
I think there's also laws of physics based on the current architecture. Its like saying looking at a 10GB video file and saying - it has to compress to 500MB right? I mean, it has to - right? Unless we invent a completely NEW way of doing videos, there's no way you can get that kind of efficiency. If tomorrow we're using quantum pixels (or something), sure 500MB is good enough but not from existing. In other words, you cannot compress a 100GB gguf file into .. 5GB. |
|
100GB to 5GB would be 20x. Video has seen an improvement of that magnitude in the days since MPEG-1.
It's interesting to consider that improvements in video codecs have come from both research and massively increased computing power, basically trading space for computation. LLMs are mostly constrained by memory bandwidth, so if there was some equivalent technique to trade space for computation in LLM inference, that would be a nice win.