|
|
|
|
|
by ted_dunning
55 days ago
|
|
Seems like you can use the core intuition here of SIMD comparison of multiple elements on more than just the terminal scale. The outline would be: a) use a gather to grab multiple elements from 16 evenly spaced locations b) compare these in parallel using a SIMD instruction c) focus in on the correct block d) if the block is small revert to linear search, else repeat the gather/compare cycle Even though the gather instruction is reading from non-contiguous memory and reading more than you normally would need to with a binary sort, enabling a multiway compare and collapsing 4 levels of binary search should be a win on large tables. You also may not be able to do a full 16-way comparison for all data types. Searching for float64 will limit you to 8 way comparisons (with AVX-512) but int32, float32 will allow 16 way comparisons. |
|
I bet you a binary search is in fact faster than any gather based methodology.