|
|
|
|
|
by dist-epoch
409 days ago
|
|
As someone which tried really hard to get deterministic outcome out of them, they really are not. Layers can be computed in slightly different orders (due to parallelism), on different GPU models, and this will cause small numerical differences which will compound due to auto-regression. |
|