| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mik1998 802 days ago

    for i in 0:7
        c += (r >> i) & 1
    end

This is just popcnt, surely Julia has a built in for that.

3 comments

borodi 802 days ago

There is, it's called count_ones. Though I wouldn't be surprised if LLVM could maybe optimize some of these loops into a popcnt, but I'm sure it would be brittle

link

lunaticd 802 days ago

author here. I thought there might be a machine instruction for this but wasn't sure, I also didn't know Julia had a count_ones that counted the 1s.

Thanks! With this the timings are even faster. I'll update the post.

link

lunaticd 802 days ago

julia> @code_typed hamming_distance(Int8(33), Int8(125)) CodeInfo( 1 ─ %1 = Base.xor_int(x1, x2)::Int8 │ %2 = Base.ctpop_int(%1)::Int8 │ %3 = Base.sext_int(Int64, %2)::Int64 │ nothing::Nothing └── return %3 ) => Int64

julia> @code_llvm hamming_distance(Int8(33), Int8(125)) ; Function Signature: hamming_distance(Int8, Int8) ; @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:13 within `hamming_distance` define i64 @julia_hamming_distance_16366(i8 signext %"x1::Int8", i8 signext %"x2::Int8") #0 { top: ; @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:14 within `hamming_distance` ; ┌ @ int.jl:373 within `xor` %0 = xor i8 %"x2::Int8", %"x1::Int8" ; └ ; ┌ @ int.jl:415 within `count_ones` %1 = call i8 @llvm.ctpop.i8(i8 %0) ; │┌ @ int.jl:549 within `rem` %2 = zext i8 %1 to i64 ; └└ ret i64 %2 }

it lowers to the machine instruction now.

I also tried 8 Int64s vs 64 Int8s and it doesn't seem to make a difference when doing the search.

EDIT: apologize for the formatting

link

shiandow 802 days ago

I think you may need to update the figures in the rest of the article. At some point you mention it should take around 128ns but with the new benchmark that's probably closer to 64*1.25=80ns.

link

sitkack 802 days ago

I had Opus translate your code to Rust

    fn hamming_distance_u8(x1: u8, x2: u8) -> usize {
        (x1 ^ x2).count_ones() as usize
    }

link

ummonk 802 days ago

From what I've heard it's actually faster to create a 256 byte lookup table than to use popcnt.

link

mik1998 802 days ago

It used to be pretty bad on old intel processors but nowadays it should be faster than an L1 fetch.

link