Hacker News new | ask | show | jobs
by shaklee3 2135 days ago
I was using CUDA heavily in 2015, and I also looked at the first/second gen of the Xeon Phi at the time. I thought it was much harder to program for than cuda was at the time (and certainly that gap has widened). I recall things like a weird ring topology between cores that you may have had to pay attention to, the memory hierarchies (you kind of do this with CUDA, but I remember it being NUMA-like), as well as the transfers to and from the host CPU were harder/synchronous compared to CUDA.

It was definitely a really cool hardware architecture, but the software ecosystem just wasn't there.

1 comments

Xeon Phi was supposed to be easy to program for, because it ran Linux (albeit an embedded version, but it was straight up Linux).

Turns out, performance-critical code is hard to write, whether or not you have Linux. And I'm not entirely sure how Linux made things easier at all. I guess its cool that you ran GDB, had filesystems, and all that stuff, but was that really needed?

---------

CUDA shows that you can just run bare-metal code, and have the host-manage a huge amount of the issues (even cudaMalloc is globally synchronized and "dumb" as a doornail: probably host managed if I was to guess).

That's right -- I always wished they made a Phi with PCIe connections out to other peripherals. Imagine a Phi host that could connect to a GPU to offload things it was better at.
That looks like 28 cores, and I think Phi went to 72 cores (or 144 with HT). Of course, the Phi was clocked much lower. The AMD is definitely more comparable.
Well... they did. That's basically called a Xeon 8180. :-)

Or alternatively, an AMD EPYC (64-cores / 128x PCIe lanes).

Now I'm remembering... They had the phi as a coprocessor in a PCI slot, effectively making it just as issue as a GPU. But the second gen (knights landing) made the phi the host processor, but removed almost all ability for external devices. It had potential I think, but it was a weird transition from v1 to v2.
I actually found the Coprocessor more interesting.

Yeah, NVidia CUDA makes a better coprocessor for deep learning and matrix multiplication. But a CPU-based coprocessor for adding extra cores to a system seems like it'd be better for some class of problems.

SIMD compute is great and all, but I kind of prefer to see different solutions in the computer world. I guess that the classic 8-way socket with Xeon 8180 is more straightforward (though expensive).

--------

A Xeon Phi on its own motherboard is just competing with regular ol' Xeons. Granted, at a cheaper price... but its too similar to normal CPUs.

Xeon Phi was probably trying to do too many unique things. It used HMC memory instead of GDDR5x or HBM (or DDR4). It was a CPU in a GPU form factor. It was a GPU (ish) running its own OS. It was just really weird. I keep looking at the thing in theory, and wondering what problem it'd be best at solving. All sorts of weird decisions, nothing else was ever really built like it.

Agreed! That's why I was bummed when the second-gen was a host system. Didn't fit well to my use case.