Hacker News new | ask | show | jobs
by luckydude 5345 days ago
Most of what they want is exactly why I wrote lmbench, it measures latency and bandwidth of just about everything that you should care about.

One of our guys plotted the memory latency benchmark here:

http://www.bitmover.com/mem_lat.jpg

He shows you that and says "tell me everything you can tell me from this graph". It's usually a two hour conversation.

2 comments

Wow. So, less than 300k or so and you stay in L1, which is crazy fast. Contiguous reads must have some trick for streaming into L1 in anticipation of the request. The only explanation i have for the large stride/large read speedup is maybe you're laying out data in separate memory modules so you get some parallel reads. I guess that curve from 8b to 4kb comes from increasing collisions? Is this even vaguely right?

That's a cool graph.

I think you might stare at it some more. You can puzzle out L1 size, L1 associativity, L1, L2, main memory latency, the cost of a TLB miss and probably a bunch of other stuff I've forgotten.
> Wow. So, less than 300k or so and you stay in L1, which is crazy fast.

Look at the colors again. L1 is ~12KB or less. 300kbish is probably L2.

I tried hard to recognize the CPU from the graph, and failed. Any help?

What is sad is that if you were to plot, say hard drive access times, on the same chart and just adjust the axis you'd realize that memory latency is orders of magnitude faster than any hard drive access.

All those access times are nanoseconds. Traditional hard drives are usually in the 4-8 millisecond range. Even SSDs clock in at around .2 milliseconds...which is 200,000 nanoseconds.

Take a trip through lmbench results, it's worthwhile. You'll get bandwidth and latency of disks, file systems, network, memory, as well as context switch costs.

It's a tiny set of tests and if you memorize the results you can sit in any design session, decompose a problem into the basic events, and prove that the design can or can not work in short order.