|
|
|
|
|
by dragontamer
1110 days ago
|
|
Popcnt is probably the most broadly applicable BMI instruction with a myriad of applications in parallel programming. I'm not surprised to hear that it has applications for string parsing, but I admit that I can't see how right now. But I can believe it even without seeing an example. TZCNT being used for string parsing... Sorry, can't see it at all or how it could be possible lol. Could you give an example? (31 - LZCNT(x)) is binary 32-bit logarithm and likely has a number of mathematical applications. |
|
If we want to keep the list of positions, there are other approaches for converting the vector mask to the list of positions. We could and the vector mask with {0,1,2,3,4...}, then widen by 4x to get int32 positions and add our current int32 position, then use or emulate vcompress and unconditionally write everything to the list, counting on updating the position by popcnt(mask) to keep the array dense. So we'd still need popcnt, but some implementations of this sort of thing end up computing this some other way[1] and don't literally use the popcnt instruction. This approach might be more reasonable if we only widen by 2x and produce a list of int16 positions per 64k input chunk then go back and widen that whole list to int32 later.
[0]: https://github.com/simdjson/simdjson/blob/master/src/generic...
[1]: https://github.com/lemire/despacer/blob/master/src/despacer....