| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rahimnathwani 549 days ago

  we already do FP8 for inference

Yes but, for a given size of model, Deepseek claims that a model trained with FP8 will work better than a model quantized to FP8. If that's true then, for a given quality, a native FP8 model will be smaller, and have cheaper inference.