Hacker News new | ask | show | jobs
by fnl 3572 days ago
Even a hundred million integers still adds up to only 300 MB extra for 64- vs. 16-bit. On any reasonable modern server, laptop, or desktop, that kind of memory usage probably will be the least of your worries, all the more if you really have an application that needs to hold hundreds of millions of ints in memmory at the same time.

And if you are programming for a very specific embedded or otherwise constrained system, then you anyways want full control over the exact sizes of your types, as discussed elsewhere here.

Is this "wasting resources", as you say? Probably yes. Is it worth the extra development effort to fine-tune that on modern machines? Probably not - and it might even be premature optimization. (Yes - I agree there are corner case where it indeed will make sense, but those are the exception, not the norm.)

1 comments

I'm not sure where you are getting the 300MB figure from. ((64-16)/8)x10^8/2^20 gives me 572.2MB of wasted space. What's even more interesting is looking at the percentage of wasted space. (((64-16)/8)x10^8/2^20)/(((64)/8)x10^8/2^20) means a whopping 75% of the memory use of our program is completely useless wasted space.

Using more space than you need to will also impact performance. First there's the cache issues. A 64KB L1 cache can fit 32768 16-bit integers, but only 8192 64-bit integers. Other cache layers will also fit less 64-bit than 16-bit integers in them, causing 4x more hits to the slow RAM backing store. Hitting RAM is very slow in comparison with cpu operations, so this will make your program a lot slower.

There's also the computational speed issues. Lets say your problem can be implemented using the AVX/AVX2 instructions. These registers can compute multiple results at once, in parallel. The AVX registers are 256 bits, which means they can operate on 16 16-bit integers at once. In comparison, they can only work on 4 64-bit integers at once. So there's another potential for 4x improvement, although the cache problems are probably going to be a bigger issue in practice.

Sorry for the 400-100=300MB mix-up, you are completely right, it should have been 800-200=600MB.

If you need to crunch all those numbers at once, then yes, your cache will become your bottleneck. But if you were, say, keeping the count on a 100 million things, you most of the time will not be worried about that or your RAM usage, as most counts typically tend to follow a power-law distribution. Therefore, contention to make sure you are tracking every count will probably become a far worse problem for your app than your ability to shuffle data to/from the CPUs. Only in the corner case where you crunch a matrix or vectors or numbers of that size at once, you will start to get worried. But as said, I think the tensor-crunching use-case is the exception, not the norm.