You're conflating data misses with instruction misses. Besides, the article is about hundred-megabytes-binaries (forgot to strip debug data?), which the Linux kernel isn't.
The quote above talks exclusively of instruction cache misses. In case you are really interested, the two kinds are related as L2 and L3 caches are shared my instructions and data.
In terms of the execution profile, the kernel is very close to a typical WSC application. I.e. very flat without hotspots. The size of L1 I$ is 32KB, hence your application doesn't have to have 100s of megabytes of code to benefit from layout optimizations.
In terms of the execution profile, the kernel is very close to a typical WSC application. I.e. very flat without hotspots. The size of L1 I$ is 32KB, hence your application doesn't have to have 100s of megabytes of code to benefit from layout optimizations.