|
|
|
|
|
by touisteur
1356 days ago
|
|
Tell me about non guaranteed order of operations in GPU reductions and floating point results changing slightly between two runs. Yes it's useful and you get the goddamn FP32 TFLOPS, but damn it makes testing, validating, qualifying systems harder. And yes, I know one shouldn't rely and test on equality, but not knowing the actual order of FP operations makes numerical analysis of the actual error harder (just take the worst case of every reduction, ugh). EDIT: and don't get me started on tensor cores and clever tricks to have them do 'fp32-alike' accuracy. Yes, wonderful magic but how do you reason about these new objects without a whole new slew of tools. |
|