Hacker News new | ask | show | jobs
by rahimnathwani 501 days ago

  we already do FP8 for inference
Yes but, for a given size of model, Deepseek claims that a model trained with FP8 will work better than a model quantized to FP8. If that's true then, for a given quality, a native FP8 model will be smaller, and have cheaper inference.