Hacker News new | ask | show | jobs
by jiggawatts 1391 days ago
I would have expected 2048 items to be the optimal cutover point, because with 16-bit entries that would result in 4KB arrays. Those fit exactly into a typical CPU memory page, whereas 8KB requires two. In the worst case that might double the page table overheads, e.g.: for random point reads.

I wonder if it would be worth the trouble to code-gen a bunch of variants with things like 8-bit entries, and benchmark to death to determine the optimal cutover points...

1 comments

I think if you go down this road, you'll start to see differences depending on the brand and model of CPU you use. Essentially, you're going into the territory of FFTW's "planner" scheme.