|
|
|
|
|
by scarmig
1009 days ago
|
|
But why are there discrepancies in the floating point arithmetic? They have errors when approximating the reals, but floating point operations are all well-defined: even if 0.1 + 0.2 != 0.3, it's still always true that 0.1 + 0.2 == 0.1 + 0.2. I figure the issue must be something related to concurrency in a fleet of GPUs during inference, but even then it's not clear to me where the nondeterminism would creep in. Maybe different experts simultaneously work on an inference and the first to respond wins? Switching to models with different quantization depending on load? |
|
This leads to different results from accumulating sums in different orderings. Accumulating in different ordering is common in parallel math operations.