|
|
|
|
|
by jeffbee
485 days ago
|
|
I regularly find broken thermal solutions in azure and when I worked at Google it was also a low-level but constant irritant. When I joined Dropbox I said to my team on my first day that I could find a machine in their fleet running at 400MHz, and I was right: a bogus redundant PSU controller was asserting PROCHOT. These things happen whenever you have a lot of machines. |
|
It was very non-obvious to debug since pretty much most emitted metrics, apart from mysterious errors/timeouts to our service, looked reasonable. Even the cpu usage and cpu temperature graphs looked normal since it was a bogus prochot and not actually a real thermal throttling