Hacker News new | ask | show | jobs
by michaelt 3946 days ago

  Why should I care
Let's say you were processing map data for OpenStreetMap, which has about 3,720,000,000 nodes [1], and for each node you want to store a latitude and a longitude.

If you use a pair of primitive ints for the latitude and longitude you'll need 8 bytes of memory per node, or 30 gigabytes total.

(Ignoring caching and assuming UseCompressedOops is enabled) Each boxed int will take 4 bytes for the primitive and 8 bytes of housekeeping overhead and class reference, rounded up to 16 bytes because memory boundaries must be divisible by 8, plus you'll need a reference to that object which is another 4 bytes.

So replacing those two primitive ints with boxed ints will take 40 bytes of memory per node, or 150 gigabytes total. And now you've got a 150 gigabyte heap to garbage collect :)

That memory increase is the difference between an EC2 instance that costs $1600/year and one that costs $13000 - and primitives will probably perform better as well.

And if you want to store more than two ints for each node, it could be the difference between an off-the-shelf EC2 server and needing to get hold of special high memory servers.

[1] https://www.openstreetmap.org/node/3720000000