|
|
|
|
|
by tveita
850 days ago
|
|
Are there any cool highlights you can give us about gemma.cpp? Does it have any technical advantages over llama.cpp? It looks like it introduces its own quantization format, is there a speed or accuracy gain over llama.cpp's 8-bit quantization? |
|
We do not yet have full evals because the harness was added very recently, but observe that the non-uniform '4-bit' (plus tables, so 4.5) has twice the SNR of size-matched int4 with per-block scales.
One advantage that gemma.cpp offers is that the code is quite compact due to C++ and the single portable SIMD implementation (as opposed to SSE4, AVX2, NEON). We were able to integrate the new quantization quite easily, and further improvements are planned.