| Your #1 tool is simply a stopwatch. Write programs, time them, and then learn. See why some programs are faster or slower than others. Your #2 tool is hardware performance counters. Its much easier to understand the cache and branch prediction when you use the hardware to count how many times the cache is hit, and when branches are taken. http://www.brendangregg.com/perf.html#Events Section 5 of Linux "perf" accesses these hardware performance counters for you. Write a program, see how many cache-misses it has. Write a slightly different program, the cache-misses will be different. How did it change your #1 measurement (time to complete the program) ?? Now do that a thousand times, with a thousand different programs trying to solve the same thing. Bam, now you're an expert. Its not really hard. Its just an issue of experience. You'll get there if you work at it, and use the right tools to "see" what is going on. ------ Why do people talk about cache hits and branch prediction? Because in THEIR experience, counting cache hits and looking at branch predictions results in performance gains. Their experience won't necessarily match yours, but you can still learn from them in general. |
"perf list" list 1300 things, and it seems those counters come and go between architectures.
Intel's tools are free now, and give you a very nice graphical analysis tool. You don't need to buy/use the Intel compiler, just have a 1-2 gigs of free disk space.
Here's a page describing analysis if you want to go beyond why some section of your code spends 40% CPU: https://software.intel.com/en-us/vtune-amplifier-cookbook-to...