| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by flgr 3276 days ago

Sounds like the author did — http://www.lihaoyi.com/post/BenchmarkingScalaCollections.htm...

Notably Vector index access takes 4,260,000 ns for a vector with 1,048,576 elements instead of measured 0 ns for native arrays. is about +4 ns extra per element and hints at the whole data set still fitting into L2 cache.

If the process ends up accessing more working memory than fits into L2 (32 or 64 MB or so) and lookups aren't nicely bundled together, the overhead will approach about +80 ns per element access. Or +0.08 seconds per million element accesses.

It does seem unlikely that every element access would end up causing a cache miss since I'd expect lookups to happen close together, but this can be significant for intense workloads such as OLAP.

1 comments

andrewla 3276 days ago

Based on the numbers for lookup, it looks worse than that -- it seems to be linear (r2 = .9988).

    df <- data.frame(
      s=c(0,1,4,16,64,256,1024,4096, 16192, 65536,
          262144, 1048576),
      t=c(0, 1, 5, 17, 104, 440, 1780, 8940, 38000,
          198000, 930000, 4260000))
    > summary(lm(t ~ s + 0, data=df))
    ...
    Coefficients:
      Estimate Std. Error t value Pr(>|t|)
    s  4.02825    0.04161   96.82   <2e-16 ***

    Residual standard error: 45060 on 11 degrees of freedom
    Multiple R-squared:  0.9988,    Adjusted R-squared:  0.9987
    F-statistic:  9374 on 1 and 11 DF,  p-value: < 2.2e-16

link

cscheid 3276 days ago

That's because the program is looking up every item. It's _right there in the text_:

> Note that this time is measuring the time taken to look up every element in the collection, rather than just looking up a single element.

http://www.lihaoyi.com/post/BenchmarkingScalaCollections.htm...

link

andrewla 3276 days ago

Shame on me for not looking more closely!

So when you look at the per-operation cost (t/s in my model) then what emerges is, indeed, logarithmic (r2 ~ .96 for lm(t/s ~ s + log(s)). Bounded, for sure, but if you walk in with a vector-indexing-bound but array-size-independent algorithm expecting that the performance for 1e3 elements will be representative of the behavior for 1e6, you're in for a surprise.

link