Hacker News new | ask | show | jobs
by rc4algorithm 3801 days ago
All of that useless zeroing is very painful from a performance perspective. It just isn't a reasonable option in performance-relevant code like kernels and core infrastructure. The Moore's Law argument doesn't apply because memory bus latencies and case pressure are lasting problems.
1 comments

On high-performance code for an embedded PPC system I used to work on, we made all our control block a multiple of the L1 cache width. Our allocation routines then all had inline assembler to run the dcbz instruction (data cache block zero) on all the cache blocks for the control block as it was allocated. This meant the control block was always zeroed, and the memory bus wasn't touched in order to do so. Yes, things were evicted from the cache, but since we're about to start writing things into the control block, the lack of fetch was a net gain.