AVX512 is garbage. It incurs a massive performance hit from both frequency and mode-switching penalties. Apart from a few niche HPC and ML applications which you'll never encounter, AVX512's most compelling use cases are to drive Intel's shady market fragmentation, and to create more bullshit FP benchmarks that Intel can claim to win.
Now that we've covered both ends of hyperbole, it's maybe worth noting that a lot of CPU parallel tasks can be well accelerated by ISPC, which can make reasonably effective use of AVX512 (with the aforementioned clock speed caveat): https://www.mail-archive.com/ispc-users@googlegroups.com/msg...
Also, AVX512 is a much nicer (orthogonal) ISA than SSE or AVX2.
While there are some niche applications that need larger amounts of memory than GPUs can offer, it's worth noting that this speedup comes from making the CPU act more like a GPU, and they aren't as fast as GPUs acting like GPUs (which are essentially many 32-wide vector units, rather than AVX512's 16-wide).
Which applications, and why? Some computational ones will go significantly slower on the same number of cores if they could have kept avx512 fed (perhaps small data in cache), but most don't spend all their time in something like GEMM. The new UK "tier 1" HPC system is all EPYC.
Automatic vectorization for AVX512 would work on the simplest of cases and using intrinsics or writing inline assembly is beyond the scope of 99% of software projects.
There's a very small set of applications where the precision of AVX512, but more cores also speeds them up.
A much more important set of applications is those sped up with GPUs or other ML accelerators. Notably the high performance of these CPUs is useful with those too, because they are great at data pipeline crunching prior to the GPU part.
https://www.extremetech.com/computing/312673-linus-torvalds-...