Hacker News new | ask | show | jobs
by andrewla 3238 days ago
Based on the numbers for lookup, it looks worse than that -- it seems to be linear (r2 = .9988).

    df <- data.frame(
      s=c(0,1,4,16,64,256,1024,4096, 16192, 65536,
          262144, 1048576),
      t=c(0, 1, 5, 17, 104, 440, 1780, 8940, 38000,
          198000, 930000, 4260000))
    > summary(lm(t ~ s + 0, data=df))
    ...
    Coefficients:
      Estimate Std. Error t value Pr(>|t|)
    s  4.02825    0.04161   96.82   <2e-16 ***

    Residual standard error: 45060 on 11 degrees of freedom
    Multiple R-squared:  0.9988,    Adjusted R-squared:  0.9987
    F-statistic:  9374 on 1 and 11 DF,  p-value: < 2.2e-16
1 comments

That's because the program is looking up every item. It's _right there in the text_:

> Note that this time is measuring the time taken to look up every element in the collection, rather than just looking up a single element.

http://www.lihaoyi.com/post/BenchmarkingScalaCollections.htm...

Shame on me for not looking more closely!

So when you look at the per-operation cost (t/s in my model) then what emerges is, indeed, logarithmic (r2 ~ .96 for lm(t/s ~ s + log(s)). Bounded, for sure, but if you walk in with a vector-indexing-bound but array-size-independent algorithm expecting that the performance for 1e3 elements will be representative of the behavior for 1e6, you're in for a surprise.