|
|
|
|
|
by craigacp
387 days ago
|
|
The same operations in the same order is a tough constraint in an environment where core count is increasing and clock speeds/IPC are not. It's hard to rewrite some of these algorithms to use a parallel decomposition that's the same as the serial one. I've done a lot of work on reproducibility in machine learning systems, and its really, really hard. Even the JVM got me by changing some functions in `java.lang.Math` between versions & platforms (while keeping to their documented 2ulp error bounds). |
|
"parallel decomposition that's the same as the serial one" would be difficult in many ways, but only needed when you can't afford a one-time change.