| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tbirdz 3570 days ago

I'm not sure where you are getting the 300MB figure from. ((64-16)/8)x10^8/2^20 gives me 572.2MB of wasted space. What's even more interesting is looking at the percentage of wasted space. (((64-16)/8)x10^8/2^20)/(((64)/8)x10^8/2^20) means a whopping 75% of the memory use of our program is completely useless wasted space.

Using more space than you need to will also impact performance. First there's the cache issues. A 64KB L1 cache can fit 32768 16-bit integers, but only 8192 64-bit integers. Other cache layers will also fit less 64-bit than 16-bit integers in them, causing 4x more hits to the slow RAM backing store. Hitting RAM is very slow in comparison with cpu operations, so this will make your program a lot slower.

There's also the computational speed issues. Lets say your problem can be implemented using the AVX/AVX2 instructions. These registers can compute multiple results at once, in parallel. The AVX registers are 256 bits, which means they can operate on 16 16-bit integers at once. In comparison, they can only work on 4 64-bit integers at once. So there's another potential for 4x improvement, although the cache problems are probably going to be a bigger issue in practice.

1 comments

fnl 3568 days ago

Sorry for the 400-100=300MB mix-up, you are completely right, it should have been 800-200=600MB.

If you need to crunch all those numbers at once, then yes, your cache will become your bottleneck. But if you were, say, keeping the count on a 100 million things, you most of the time will not be worried about that or your RAM usage, as most counts typically tend to follow a power-law distribution. Therefore, contention to make sure you are tracking every count will probably become a far worse problem for your app than your ability to shuffle data to/from the CPUs. Only in the corner case where you crunch a matrix or vectors or numbers of that size at once, you will start to get worried. But as said, I think the tensor-crunching use-case is the exception, not the norm.