|
|
|
|
|
by kadushka
54 days ago
|
|
Most quant papers I've seen usually report non-trivial degradation on standard benchmarks, like 1-10% degradation (compared to FP16/BF16). Especially when using 4 bits or lower. For example, I just opened a random paper: https://arxiv.org/pdf/2410.09426 see Table 1. p.s. dense vs MoE: both are being released because they offer different trade-offs: at the same level of quality, MoE will use less compute, but more memory. |
|