Hacker News new | ask | show | jobs
by bayindirh 67 days ago
You can always go through cachegrind or perf and see what happens with your code.

I managed to reach practical IPC limits of the hardware I was running on, and while I could theoretically make prefetcher happier with some matrix reordering, looking back, I'm not sure how much performance it provided since the FPU was already saturated at that point.