Hacker News new | ask | show | jobs
by bob1029 300 days ago
Could be the power supply and load profile?

I've heard some really wild noises coming out of my zen4 machine when I've had all cores loaded up with what is best described as "choppy" workloads where we are repeatedly doing something like a parallel.foreach into a single threaded hot path of equal or less duration as fast as possible. I've never had the machine survive this kind of workload for more than 48 hours without some kind of BSOD. I've not actually killed a cpu yet though.

3 comments

I've never had the machine survive this kind of workload for more than 48 hours without some kind of BSOD.

Then you shouldn't trust the results of your work either, as that's indicative of a CPU that's producing incorrect results. I suggest lowering the frequency or even undervolting if necessary until you get a stable system.

...and yes, wildly fluctuating power consumption is even more challenging than steady-state high power, since the VRMs have to react precisely and not overshoot or undershoot, or even worse, hit a resonance point. LINPACK, one of the most demanding stress tests and benchmarks, is known for causing crashes on unstable systems not when it starts each round, but when it stops.

The results might be invalid for one generation but the model is resilient to these kinds of events overall. Far more resilient than my operating system is.

Randomly flipped genome bits could even be beneficial for escaping local minima and broken RNG in evolutionary algorithms. One bad evaluation won't throw the whole thing off. It's gotta be bad constantly.

I experienced that with a GPU years ago. A workload I wrote caused a pronounced high frequency noise from the card that I've never encountered the like of before or since. I'd describe it as a very high frequency chirping. I refactored the program rather than seeing what would come of it.
Is that, like, an intentional stress-test for the hardware that you’ve come up with?
No. It is just how the algorithms play out:

1. Evaluate population of candidates in parallel

2. Perform ranking, mutation, crossover, and objective selection in serial

3. Go to 1.

I can very accurately control the frequency of the audible PWM noise by adjusting the population size.