|
|
|
|
|
by nl
813 days ago
|
|
While saying "we want more efficiency" is great there is a trade off between size and accuracy here. It is possible that compressing and using all of human knowledge takes a lot of memory and in some cases the accuracy is more important than reducing memory usage. For example [1] shows how Gemma 2B using AVX512 instructions could solve problems it couldn't solve using AVX2 because of rounding issues with the lower-memory instructions. It's likely that most quantization (and other memory reduction schemes) have similar problems. As we develop more multi-modal models that can do things like understand 3D video in better than real time it's likely memory requirements will increase, not decrease. [1] https://github.com/google/gemma.cpp/issues/23 |
|