|
|
|
|
|
by david-gpu
824 days ago
|
|
> Why isn't she prioritizing firmware that doesn't crash? I used to work in the GPU industry and this sort of view is both pervasive and misguided. GPUs are immensely complex machines. It is really hard to get them to work, let alone work with high performance. Because of this, and in spite of the amount of time and resources spent on validation and verification, the hardware often contains flaws. It is the responsibility of the drivers to work around these flaws in various ways. When a flaw hasn't been discovered and worked around yet, you perceive it as the GPU being unstable or crashing. There is no fast simple solution to this. You need a finely tuned corporate machine from beginning to end. Better hiring processes, better management, better design processes, better verification processes, better software development practices, better marketing and sales, better customer relations. Everything. |
|
This is like saying combustion engines are immensely complex machines when your car suddenly loses power on the highway for no apparent reason and then when you restart the engine it works for another five minutes again. When you drive on normal roads it works flawlessly. It must be the engine, right? After all, it is the most complicated aspect!
Except in reality it is far more likely for it to be a problem in the electronics driving the fuel pump or spark plug.
AMD most likely has some sort of buffer overflow or deadlock in their GPU drivers that is causing difficult to diagnose problems. It is very unlikely that the silicon itself is broken when it works fine for playing video games and it also works fine when your GPU is one of the few officially supported by ROCm.