Hacker News new | ask | show | jobs
by hayfield 4532 days ago
Some work I was doing over the summer indicates that the optimal datatype depends on the algorithm you're using.

qsort() can get 15% better performance with uint64_ts than uint32_ts (sorting identical arrays of 8 bit numbers represented differently). On the other hand, a naive bubble sort implementation was managing 5% better performance with 16 bit datatypes over any of the others.

If you start measuring energy usage as well, it becomes even stranger - using a different datatype can make it run 5% faster, while using 10%+ less energy (or in the qsort() case, take less time, but use more energy).

2 comments

qsort strikes me as a bit of a crappy benchmark because of the type erasure. Your compiler likely isn't inlining or applying vectorisation. Vectorisation would likely benefit 32bit floats on a 64bit platform more than doubles.
Indeed - it probably isn't ideal.

I'd tested a number of sort algorithms (bubble, insertion, quick, merge, counting), so also testing the sort function from stdlib seemed like a logical continuation. It was done rather quickly, so there wasn't time to properly investigate, merely look at the numbers and go "that's strange".

What about 32 bit ints stored 64 bits away from each other?
I'm not sure. This was a fairly quick experiment on an Ivy Bridge processor where the actual results were of secondary concern at the time (it was proof-of-concepting a separate tool).

If you look on arXiv.org, there are publications looking more at alignment on Cortex and XCore architectures. There's probably work relating to x86, though the longer instruction pipeline makes it more complicated to reason about than the simpler architectures.