Hacker News new | ask | show | jobs
by marginalia_nu 1030 days ago
Depends a lot on what you're doing too. I do a fair bit of heavy data processing work with my search engine (tokenizing something like a billion documents into arrays of words etc), and allocator contention has a pretty huge performance impact for that type of work.

My intuition is that the best thing is to aim for the expected median size, rather than the maximum as one might assume would be the most performant. The maximum strategy minimizes re-allocations, but at the expense of always making costlier allocations.

1 comments

I think it depends a lot on the other details, especially how expensive the extra GC will be vs the wasted space. Hard to give a rule that will work in all contexts.

In our case, it wasn't a single hard-coded number - the input data gave us the upper bound, and the difference between the upper bound and the median case was so small that going with the upper bound worked out best.