Hacker News new | ask | show | jobs
by martincmartin 3359 days ago
The x-axis is the number of iterations, i.e. the number of lookups. The collection is always size 1,000,000. Also, each of those lookups is for the exact same entry, so after the first time, it'll be all cache hits. Well, except for the std::vector, which is what you're talking about... :)
1 comments

Strings are length 10 to 100. A cache line is 64 bytes. A cache miss is < 1000 cycles < 10^3ns. That makes < 1s for 10^6 strings.

Furthermore adjacent strings are likely to be adjacent in memory, from which I would naively expect cache prefetching to succeed.

Additionally, the curve is quadratic (if at all), not linear. I bet the author looked up all n strings from the vector of n strings.

I think the curves are linear because the x-axis is log scale but the y-axis is not.
Gotcha. It's probably still quadratic, just judging from the values. You can't need 32 minutes for 10^6 100-byte comparisons (which typically abort after the first character compares unequal. Not that that matters).
It's definitely a strange set of graphs:

Mixed (log/linear) scales on the axes

Only two columns of points on each graph matter (the last two) because the left hand side is all zeros (an artifact of the y axis not being log scale)

The y-axis is sometimes in thousands-of-mega-nanoseconds (with two significant digits) or sometimes in giga-nanoseconds (with one).

Giga-nanoseconds are seconds!

The colors for each line sometimes changes between graphs.