I haven't seen any numbers on that but there's literally zero reason to run a Xeon Phi without using AVX-512, so I'd assume no design considerations were taken to optimize the clock frequency for a non-AVX-512 use case.
The Phi was an interesting computer. AVX512 on 60 cores back in 2015 was pretty nuts. CUDA wasn't quite as good as it is today (there have been HUGE advancements in CUDA recently).
These days, we have a full-fat EPYC or Threadripper to use, and even then its only 256-bit vector units. CUDA is also way better and NVidia has advanced dramatically: proving that CUDA is easier to code than people once thought. (Back in 2015, it was still "common knowledge" that CUDA was too hard for normal programmers).
Intel's Xeon Phi was a normal CPU processor. It could run normal Linux, and scale just like a GPU (Each PCIe x16 lane added another 60 Xeon Phi cores to your box).
It was a commercial failure, but I wouldn't say it was worthless. NVidia just ended up making a superior product, by making CUDA easier-and-easier to use.
I was using CUDA heavily in 2015, and I also looked at the first/second gen of the Xeon Phi at the time. I thought it was much harder to program for than cuda was at the time (and certainly that gap has widened). I recall things like a weird ring topology between cores that you may have had to pay attention to, the memory hierarchies (you kind of do this with CUDA, but I remember it being NUMA-like), as well as the transfers to and from the host CPU were harder/synchronous compared to CUDA.
It was definitely a really cool hardware architecture, but the software ecosystem just wasn't there.
Xeon Phi was supposed to be easy to program for, because it ran Linux (albeit an embedded version, but it was straight up Linux).
Turns out, performance-critical code is hard to write, whether or not you have Linux. And I'm not entirely sure how Linux made things easier at all. I guess its cool that you ran GDB, had filesystems, and all that stuff, but was that really needed?
---------
CUDA shows that you can just run bare-metal code, and have the host-manage a huge amount of the issues (even cudaMalloc is globally synchronized and "dumb" as a doornail: probably host managed if I was to guess).
That's right -- I always wished they made a Phi with PCIe connections out to other peripherals. Imagine a Phi host that could connect to a GPU to offload things it was better at.
That looks like 28 cores, and I think Phi went to 72 cores (or 144 with HT). Of course, the Phi was clocked much lower. The AMD is definitely more comparable.
I always suggested making software run well on a Phi would be valid research for making it run well on future Xeon Scalable and Core i9.
If you can make your code so parallelizable it runs well on a Phi, it'll run extremely well on future CPUs because clocks won't get much higher, but core counts will.
Good point, but from what I recall the phi had two different memory types, and you had to specify which you were targeting. This didn't necessarily translate to the server CPUs.