|
|
|
|
|
by nextaccountic
800 days ago
|
|
> The Transformer model is evaluated in a deterministic and reproducible way. Hence the result does not depend on the exact GPU or CPU model nor on the number of configured threads. This key point ensures that a compressed file can be decompressed using a different hardware or software configuration. How is this possible? Does it use floating point and concurrency? Cross-platform floating point determinism is seriously difficult. The Rapier physics engine could do it [0] at the expense of disabling simd and multithreading. It also works only on platforms that strictly comply to IEEE 754-2008 which I think that GPUs usually don't qualify (regarding subnormal numbers etc). Another thing that may have issues is fused multiply-add which may give higher precision than doing multiplication and addition separately (I think some platforms don't have FMA in hardware) For example, it seems that TSAC currently runs on CPUs and nvidia GPUs. Could porting to AMD GPUs affect determinism? [0] https://rapier.rs/docs/user_guides/rust/determinism/ |
|