Hacker News new | ask | show | jobs
by bildung 509 days ago
Well there's the practical reason of the model natively being fp8 ;) One of the innovative ideas making it so much less compute-intensive, apparently.