Hacker News new | ask | show | jobs
by bwoj 2739 days ago
I've seen this tradeoff firsthand with dsp video algorithms. The naive code just implements the algorithm straight away. The performant version has to ensure that the inner loop all fits in cache while running. It also does tricks like prefetching data into cache so the code doesn't stall on a data load. These sort of tricks really impact the readability of the code.