Hacker News new | ask | show | jobs
by rygorous 845 days ago
(I'm the person who did most of the investigation.)

A relatively major realization during the investigation was that a different mystery bug that also seemed to be affecting many Unreal Engine games, namely a spurious "out of video memory" error reported by the graphics driver, seemed to be occurring not just on similar hardware, but in fact the exact same machines.

For a public example, if you google for "gamerevolution the finals crash on launch" and "gamerevolution the finals out of video memory", you'll find a pair of articles describing different errors, one resulting from an Oodle decompression error, and one from the graphics driver spuriously reporting out-of-memory errors, both posted on the same day with the same suggested fix (lower P-core max clock multiplier).

That's the problem right there in a nutshell. It's not just Oodle detecting spurious errors during its validation. Other code on the same machine is glitching too. And "just try repeating" is not a great fix because we can't trust the "should we repeat?" check any more on that machine than we can trust any of the other consistency checks that we already know are spuriously failing at a high rate.

Many known HW issues you can work around in software just fine, but frequent spurious CPU errors don't fall into that category.