Hacker News new | ask | show | jobs
by astrange 245 days ago
It's partly because floating point math is not associative and GPU inference doesn't guarantee all the steps will be done in the same order.