|
|
|
|
|
by gpderetta
1072 days ago
|
|
Even simpler: just sum all elements of the array. Then at the end subtract 'p'*len from the sum, then divide by ('s'-'p') to get the s count. The 'p' count is len minus the 's' count. The initial sum is easily vectorized as well. If I've not made any mistakes it should work. Only issue is possible overflow on the running sum. Can't be bothered to benchmark it though:). edit: missed the decrement when you see 's'. So the final result is p_count - s_count. |
|
With vectorization, I think the way to go is to have two nested loops, an outer advances by 32 * 255 elements at a time, and an inner one that loads 32 bytes, compares each character to 's', and accumulates on 8 bit lanes.
Then in the outer loop you do an horizontal sum of the 8 bit accumulators.