I believe it does 128 bits per instruction, but I'm still struggling with rust w/ asm.
Along my journeys, however, I found this repo https://github.com/WojciechMula/sse-popcount/ which has tons of competing simd implementations for both intel and arm.