Hacker News new | ask | show | jobs
by Kamshak 449 days ago
There is also unintentional randomness due to the parallelism in inference (e.g. parallel matmuls added together on the GPU). Since it's multiplying floats every operation has rounding drift that accumulates differently depending on the order of operations. So even at temperature 0 you're not getting deterministic outputs
1 comments

Because addition and multiplication are not associative with floats ?