Hacker News new | ask | show | jobs
by dragontamer 2979 days ago
Brendan Gregg diagnoses a performance problem on Netflix servers. He starts with CPU Utilization, but CPU Utilization metrics from top or ps don't give enough detail. He checks out more detailed performance views (including flame graphs, and some custom tools he wrote himself) to eventually narrow the problem down to the TLB: Translation-Lookaside Buffer. The mechanism the CPU uses to implement virtual memory.

Finally, he realizes that TLB-Misses were caused by Meltdown / Spectre Patches. Which is a difference between the two servers he was diagnosing the performance problem for.