|
|
|
|
|
by rahimnathwani
501 days ago
|
|
we already do FP8 for inference
Yes but, for a given size of model, Deepseek claims that a model trained with FP8 will work better than a model quantized to FP8. If that's true then, for a given quality, a native FP8 model will be smaller, and have cheaper inference. |
|