Hacker News new | ask | show | jobs
by miguelraz 761 days ago
Dom! Fellow Julian here!

I loved this post <3 The hamming distance is one of my favorite demos of the conciseness of Julia:

hamming_distance(s1, s2) = mapreduce(!=, +, s1, s2)

I'm a bit swamped at the moment but I'll a response article later - they're still some juicy perf on the table here.

Thanks for the post, such a good showcase.

1 comments

please make it even faster!
And this right here is why the Julia crowd is awesome...

A great article inspires everyone to participate in the fun.

Cheers =3

I wanted to get too fancy and I tried * LoopVectorization.jl - @turbo choked on the loop * a direct llvmcall to use AVX512 pop count - I malformed the types for the instruction * Defining the `db` as

db = [rand(Int8) for _ in 1:64, j in 1:(10^6)];

to avoid the vec of vecs structure,

and then

function my_cluster!(db, query, k) db .= query .⊻ db popcounts = mapreduce(count_ones, +, db, dims = 1) results = reshape(popcounts, last(size(db))) partialsortperm!(results, results, k) @views results[begin:k] end

...which I couldn't get to be faster than your version. If you use the `partialsortperm!` and reuse the same cache array, I suspect you'll get good speedups, as you won't be sorting the array every time. This is a classic `nth_element` algorithm.

The above is not the most amazing code, but I suspect the lack of indexing will make it ridiculously friendly for a GPU (Edit: Nope, it chokes on `partialsortperm!`).

I'm guessing the manual loopy approach should be just as good but I battled hard to get it somewhat competitive here in 6 lines of code

#@be my_cluster!(X2, q1, 5) Benchmark: 3 samples with 1 evaluation 42.883 ms (17 allocs: 15.259 MiB) 45.711 ms (17 allocs: 15.259 MiB) 46.670 ms (17 allocs: 15.259 MiB)

#@be k_closest(X1, q1, 5) Benchmark: 4 samples with 1 evaluation 27.994 ms (2 allocs: 176 bytes) 28.733 ms (2 allocs: 176 bytes) 29.000 ms (2 allocs: 176 bytes) 30.709 ms (2 allocs: 176 bytes)

I also didn't try using `FixedSizedArrays.jl` as Mose Giordano recommended in my livestream chat.