Hacker News new | ask | show | jobs
by hobs 1036 days ago
That last part is important. I have worked with many engineers who I would even classify as hard working, but spent little to no time understanding the hardware they were running on and the possibilities that it provided them.

I have heard "that's slow" or "that's good" too many times in performance talks that have completely ignored the underlying machine and what was possible.

1 comments

Learning about how the CPU cache works is probably the most useful thing you can do if you write anything that's not I/O limited. There are definitely a ton of experienced programmers who don't quite understand how often the CPU is just waiting around for data from RAM.
It is a shame that there are not better monitoring tools that surface this. When I use Activity Monitor on macOS, it would be useful to see how much of “% CPU” is just waiting on memory. I know I can drill down with various profilers, but having it more accessible is way overdue.
Instruments?
Digging around in Instruments is the opposite of accessible.

Every OS always had easy ways to tell if a process is waiting on disk or network (e.g., top, Activity Monitor). The mechanisms for measuring how often a process is waiting on memory exist, but you have to use profilers to use them. We are overdue to have them more accessible. Think of a column after “% CPU” that shows percentage of time blocked on memory.

What would you do with that information? You'd need a profiler (and either a copy of your code, or a disassembler) to make it actionable…
I would do the same thing with the information I get from top and Activity Monitor: use that to guide me to what needs investigating.

I am often developing small one-off programs to process data. I then keep some of these running in various workflows for years. Currently, I might notice a process taking an enormous amount of CPU according to top, but it might really be just waiting on memory. Surfacing that would tell me where to spend my time with a profiler.

Instruments is not nearly good enough for any serious performance work. Instruments only tells me what percent of time is spent in which part of the code. This is fine for a first pass, but it doesn’t tell me _why_ something is slow. I really need a V-Tune-like profiler on macOS.
I’ve used it professionally and generally been happy with it. What are you missing from it?
I’ve tried to use it professionally, but always end up switching to my x86 desktop to profile my code, just so I can use V-Tune.

It’s missing any kind of deeper statistics such as memory bandwidth, cache misses, branch mispredictions, etc. I think fundamentally Apple is geared towards application development, whereas I’m working on more HPC-like things.

It’s only useful once you understand how algorithmic complexity works, and how to profile your code, and how you language runtime does things. Before that your CPU cache is largely opaque and trying to peer into it is probably counterproductive.
Okay, you've made me want to learn about it. Where do I start? What concepts do I need to understand? Any reading recommendations?
Haven't read through it, but I suspect this would be a good place to start: https://cpu.land/

HN Discussion: https://news.ycombinator.com/item?id=36823605

Drepper's "What every programmer should know about memory", though you mightn't find it all interesting. https://gwern.net/doc/cs/hardware/2007-drepper.pdf