|
|
|
|
|
by Retr0id
304 days ago
|
|
That's assuming 1 IPC, and no parallelism. A desktop-class zen5 CPU has 32 threads, with AVX512. Pipelining gets you up to 2048 bits of SIMD throughput per core per clock cycle: https://www.numberworld.org/blogs/2024_8_7_zen5_avx512_teard... So assuming you use 64-bit counters, you can divide those 12 years by 1024 to get 4 days. And that's not even considering what you could do on a GPU. Edit: I might be off by a factor of 2, not sure if the SIMD throughput is per-core or per-thread. Also thermal throttling. Same ballpark though! |
|
Then your 32 HT threads aren’t really going to give you full access to the underlying SIMD registers which are going to be per core which is where I assume you realized the 2x difference might show up?
And to do += 1 multithreaded you have to partition the range or you won’t get any speed up - if you don’t amortize the cost of atomic synchronization across threads you’re going to be going slower than a non-SIMD increment.