Hacker News new | ask | show | jobs
by iamsalman 4220 days ago
The whole premise behind introducing Phi to compete with discrete GPUs from NVIDIA/AMD was to have a plug-in accelerator which supports x86 which meant no code porting needed, hence enabling companies with millions of man hours invested in their code to simply take benefit of the accelerator. However, this is not the case -- The price/performance ratio for code which is not optimized to make use of massively parallel processors would be mediocre at best.

Besides, Xeon Phi's are reincarnation of project Larrabe which never took off.

If you have to end up optimizing your code for accelerators in any case, x86 or not -- you are better off optimizing it for GPUs instead.

1 comments

You are right, I was interested exactly for the possibility that even the code that isn't particularly customized for the coprocessor would benefit from using all the cores. I always believed the GPU code doesn't do well with a lot of branches, and I hoped that Phi would better run such code than GPUs. Now searching, here's Nvidia's take:

http://www.nvidia.com/object/justthefacts.html

From what I read a long time ago Phi has the same branch limitation - it can do branches by running same code twice, just like Cuda.

not to mention "x86 compatibility as a bouns" is a tired old BS line Intel uses on clueless decision makers at the golf club, it never amounts to anything, its not like you are going to just drop your binaries in there.

Not true. The cores run independant threads and processes on an embedded Linux system that's running on the card, meaning they're much easier to program, and they allow porting of existing software without completely going back to the drawing board.
> From what I read a long time ago Phi has the same branch limitation - it can do branches by running same code twice, just like Cuda.

I'm fairly sure that's not the case now. Certainly the capability is there for it to do independent branches--just look at the GA144, which while limited in other ways, can have its 144 computers branching all over the place simultaneously. No, I'm pretty sure that's the whole point of this type of architecture: to allow more branching.

If it didn't, I'd be a little bit screwed, because I was counting on it for a compute-bound algorithm that really needs that branching.

It's not the case. These are 57 independant cores, much like you'd see in a quad core CPU, except that they're Pentium-vintage feature wise (with the addition of some modern vector instructions and SMT).

As far as I can tell, they're not binary compatible with existing software and software need recompilation, using Intels compilers.