|
|
|
|
|
by GaggiX
16 days ago
|
|
> That's technically encoding Isn't that just projecting the patches into the d_model size vectors that the models takes? >I am assuming that involves of quantization 12B model in 16GB seems very reasonable to me, int8 is top quality for running models. |
|
It sounds like marketing spin where the performance claims are based on BF16 and the “runs in 16GB” claim is on a totally different quantized version.