|
|
|
|
|
by the__prestige
1054 days ago
|
|
The question is whether people have attempted quantization (the int8 / GGML / GPTQ approaches) and whether the "flattening" of distribution due to a larger denominator results in a better quantization behavior. You'd have to specifically try quantization with and without the +1 to understand the advantage. OP argues that the advantage could be be significant. |
|