Hacker News new | ask | show | jobs
by kwertzzz 1345 days ago
I would genuinely interested to see how much bytes a HashMap<Integer, Integer> would have in Java (or other languages) for a million entries. If I am not mistaken in julia, this would be 151 MB in julia or 88% more than just storing the keys and values.

     julia> k = Int32.(1:1000_0000);  v = rand(Int32,1000_0000); 
     julia> dict = Dict(zip(k,v));
     julia> Base.summarysize(dict) / (2 * 4 * 1000_0000)
     1.8874391
     julia> Base.summarysize(dict)
     150995128
1 comments

1000_0000 is 10 million, you want 1000_000. And k is better initialized as `Int32(1):Int32(1000_000)`, which is significantly faster (though it doesn't matter for the measurement here).

Checking with a million values, on my Julia v1.8.0, the dict takes up 18 MB, or about 130% more than the plain values (or an array of Pairs) would.

Thank you for correcting this! (I get the same numbers as you for a million elements).
note that the overhead will vary a ton based on the number of elements you use. I believe Julia's dicts currently resize to be 4x current size (to avoid hash collisions), so you should see anywhere from 30% extra to 300% extra depending on how many elements you have. There has been some effort recently to move to a more SwissDict-like approach which should reduce the memory overhead.