Hacker News new | ask | show | jobs
by nl 813 days ago
While saying "we want more efficiency" is great there is a trade off between size and accuracy here.

It is possible that compressing and using all of human knowledge takes a lot of memory and in some cases the accuracy is more important than reducing memory usage.

For example [1] shows how Gemma 2B using AVX512 instructions could solve problems it couldn't solve using AVX2 because of rounding issues with the lower-memory instructions. It's likely that most quantization (and other memory reduction schemes) have similar problems.

As we develop more multi-modal models that can do things like understand 3D video in better than real time it's likely memory requirements will increase, not decrease.

[1] https://github.com/google/gemma.cpp/issues/23