Hacker News new | ask | show | jobs
by aphyr 927 days ago
Necessary depends on your use case! I spend a lot of time waiting on analyses, and the more operations I can test, the more bugs I can find. I probably invest more time in performance optimization than most people.

Regarding your specific example, uh, I don't know how you measured, but that feels... off. Here:

``` package scratch;

public class Sum { public static long sum(long[] longs) { long sum = 0; for (int i = 0; i < longs.length; i++) { sum += longs[i]; } return sum; } } ```

``` (require '[criterium.core :refer [quick-bench bench]]) (def longs (long-array (range 100000000))) (defn sum-clojure [size] (reduce + 0 (range 0 size))) (import 'scratch.Sum) ```

Summing an array of 10^8 longs like you suggested takes ~120 ms on my machine.

``` user=> (quick-bench (Sum/sum longs)) Evaluation count : 6 in 6 samples of 1 calls. Execution time mean : 119.804166 ms Execution time std-deviation : 9.124709 ms Execution time lower quantile : 115.154905 ms ( 2.5%) Execution time upper quantile : 135.597497 ms (97.5%) Overhead used : 20.384654 ns

Found 1 outliers in 6 samples (16.6667 %) low-severe 1 (16.6667 %) Variance from outliers : 15.4197 % Variance is moderately inflated by outliers ```

Your Clojure function, which sums a lazy range, takes ~4.7 seconds--about 40x slower.

``` user=> (quick-bench (sum-clojure 100000000)) Evaluation count : 6 in 6 samples of 1 calls. Execution time mean : 4.739575 sec Execution time std-deviation : 346.895187 ms Execution time lower quantile : 4.557732 sec ( 2.5%) Execution time upper quantile : 5.324739 sec (97.5%) Overhead used : 20.384654 ns

Found 1 outliers in 6 samples (16.6667 %) low-severe 1 (16.6667 %) Variance from outliers : 15.3163 % Variance is moderately inflated by outliers ```

An idiomatic Clojure reduction over the same array of longs, just so we're measuring apples to apples, takes about 12 seconds--about 100x slower.

``` user=> (reduce + longs) 4999999950000000 user=> (quick-bench (reduce + longs)) Evaluation count : 6 in 6 samples of 1 calls. Execution time mean : 12.037821 sec Execution time std-deviation : 433.661044 ms Execution time lower quantile : 11.790637 sec ( 2.5%) Execution time upper quantile : 12.779262 sec (97.5%) Overhead used : 20.384654 ns

Found 1 outliers in 6 samples (16.6667 %) low-severe 1 (16.6667 %) Variance from outliers : 13.8889 % Variance is moderately inflated by outliers ```

Incidentally, this is one of the reasons I wrote `loopr` (https://aphyr.com/posts/360-loopr-a-loop-reduction-macro-for...). One of the iteration tactics it can compile to is iteration over arrays. Nowhere near Java--I think it's probably still boxing a fair bit, and I'd need to disassemble it to see--but it's still 10x faster than the standard reduce here. ~10x slower than Java.

``` user=> (quick-bench (loopr [sum 0] [x ^"[J" longs :via :array] (recur (+ sum x)))) Evaluation count : 6 in 6 samples of 1 calls. Execution time mean : 1.259095 sec Execution time std-deviation : 3.876650 ms Execution time lower quantile : 1.256692 sec ( 2.5%) Execution time upper quantile : 1.265747 sec (97.5%) Overhead used : 20.321072 ns

Found 1 outliers in 6 samples (16.6667 %) low-severe 1 (16.6667 %) Variance from outliers : 13.8889 % Variance is moderately inflated by outliers ```

1 comments

Edit: I wonder if there's a caching thing going on. I am having wildly different performance depending on what number I put in. Does criterium affect this by isolating something?

I read a little more about this before you replied, but it seems like we're doing something different or there's a JVM difference that is bigger than expected.

- The clojure function takes me less than a second.

- (reduce + (long-array 100000000)) also takes less than a second.

- From what I understand, the compiler may have a special rule around using reduce and + that uses unboxed numbers by default

- Isn't it a big "if" that you might have a big array of primitive longs in Clojure? I can see how if you've already gone through a whole range of numbers and unboxed them then the java function would be fast. But it seems like it'd be easier to use + (again, this is running a lot faster on my machine for whatever reason).

I'm new to all of this so just being academic and trying to figure out what's going on