|
|
|
|
|
by bazizbaziz
3155 days ago
|
|
FWIW it should be possible to vectorize the search across the row store. With 24 byte tuples (assuming no inter-row padding) you can fit 2.6 rows into a 64 byte cache line (a 512 bit simd register). Then it's just a matter of proper bit masking and comparison. Easier said than done, I figure because that remainder is going to be a pain. Another approach is to use gather instructions to load the first column of each row and effectively pack the first column into a register as if it were loaded from a row store and then do a vectorized compare as in the column-store case. All of that to underscore it's not that one format vectorizes and the other doesn't. The key takeaway here is that with the column store, the compiler can automatically vectorize. This is especially a bonus for JVM based languages because afaik there is no decent way to hand-roll SIMD code without crossing a JNI boundary. |
|
The downside of this is that you are still reading 6 times as much data. A straightforward implementation of this should not be CPU bound IMO. If a Skylake Server can't keep up with memory doing 32-bit compares I'll eat my hat.
Gather is not a good idea for this purpose. Gather is very expensive. It's really mainly good for eliminating pointer chasing and keeping a SIMD algorithm in the SIMD domain.