| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tyldum 2120 days ago
	AMD lacks AVX512 instruction set, which is a show stopper for many applications.

5 comments

nate_meurer 2120 days ago

AVX512 is garbage. It incurs a massive performance hit from both frequency and mode-switching penalties. Apart from a few niche HPC and ML applications which you'll never encounter, AVX512's most compelling use cases are to drive Intel's shady market fragmentation, and to create more bullshit FP benchmarks that Intel can claim to win.

https://www.extremetech.com/computing/312673-linus-torvalds-...

link

pixelpoet 2120 days ago

Now that we've covered both ends of hyperbole, it's maybe worth noting that a lot of CPU parallel tasks can be well accelerated by ISPC, which can make reasonably effective use of AVX512 (with the aforementioned clock speed caveat): https://www.mail-archive.com/ispc-users@googlegroups.com/msg...

Also, AVX512 is a much nicer (orthogonal) ISA than SSE or AVX2.

While there are some niche applications that need larger amounts of memory than GPUs can offer, it's worth noting that this speedup comes from making the CPU act more like a GPU, and they aren't as fast as GPUs acting like GPUs (which are essentially many 32-wide vector units, rather than AVX512's 16-wide).

link

floatboth 2120 days ago

Nah, to be fair, the most compelling application is: https://gamozolabs.github.io/fuzzing/2018/10/14/vectorized_e...

link

shaklee3 2119 days ago

This is not true anymore: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-fre...

link

mcdevilkiller 2119 days ago

That's an i5. When Cloudflare talked about the issue, they were using high performance Xeons.

link

shaklee3 2119 days ago

I'm aware, but the ice lake xeons are starting to ship.

link

gnufx 2120 days ago

Which applications, and why? Some computational ones will go significantly slower on the same number of cores if they could have kept avx512 fed (perhaps small data in cache), but most don't spend all their time in something like GEMM. The new UK "tier 1" HPC system is all EPYC.

link

gameswithgo 2120 days ago

1. not many 2. avx2 with twice the cores is about as good as avx512 a lot of the time

link

smarx007 2120 days ago

Automatic vectorization for AVX512 would work on the simplest of cases and using intrinsics or writing inline assembly is beyond the scope of 99% of software projects.

link

nl 2120 days ago

> a show stopper for many applications

It's really not.

There's a very small set of applications where the precision of AVX512, but more cores also speeds them up.

A much more important set of applications is those sped up with GPUs or other ML accelerators. Notably the high performance of these CPUs is useful with those too, because they are great at data pipeline crunching prior to the GPU part.

link