| HN Mirror

Well UUID generation isn’t going to be quite as SIMDable as counting so the analogy breaks down there partially because of that. And += 1 isn’t a very SIMDable operation? Unless I guess you create a mask of +1, +2, +3, +4 and add that to your base number to generate those offsets (which only works with avx512 - avx2 can only do 2 increments since these are 64bit integers)

Then your 32 HT threads aren’t really going to give you full access to the underlying SIMD registers which are going to be per core which is where I assume you realized the 2x difference might show up?

And to do += 1 multithreaded you have to partition the range or you won’t get any speed up - if you don’t amortize the cost of atomic synchronization across threads you’re going to be going slower than a non-SIMD increment.