Hacker News new | ask | show | jobs
by nextaccountic 800 days ago
> The Transformer model is evaluated in a deterministic and reproducible way. Hence the result does not depend on the exact GPU or CPU model nor on the number of configured threads. This key point ensures that a compressed file can be decompressed using a different hardware or software configuration.

How is this possible? Does it use floating point and concurrency?

Cross-platform floating point determinism is seriously difficult. The Rapier physics engine could do it [0] at the expense of disabling simd and multithreading. It also works only on platforms that strictly comply to IEEE 754-2008 which I think that GPUs usually don't qualify (regarding subnormal numbers etc). Another thing that may have issues is fused multiply-add which may give higher precision than doing multiplication and addition separately (I think some platforms don't have FMA in hardware)

For example, it seems that TSAC currently runs on CPUs and nvidia GPUs. Could porting to AMD GPUs affect determinism?

[0] https://rapier.rs/docs/user_guides/rust/determinism/

1 comments

It's possible but you have to make sure that floating point operations always happen in the same order (for example you could operate on blocks concurrently then merge them serially). You also have to be careful with optimizations like FMA because they produce a different result than multiply then add.
Are you sure this cross-platform determinism works for GPUs? I can't find any reference about that.