Hacker News new | ask | show | jobs
by scottlegrand 3596 days ago
Not even wrong. I have two PCs with 4 Titan X (Maxwell) GPUs and a third PC with 4 Titan X (Pascal) GPUs. Both of these systems are available today (I built them myself, total BOM about $7K), and both will destroy 4 Xeon Phi servers at Deep Learning.

The benchmark Intel presented here is as disingenuous as their infamous white paper from 2010: http://pcl.intel-research.net/publications/isca319-lee.pdf

In comparison, a single Knights Landing Xeon Phi will be ~$7K. I know where I put my money. Caveat Emptor.

But Xeon Phi and I go way back here. They've been trying to beat my AMBER GPU code since 2013 or so. Many man years later I believe that a Knight's Corner is now ~35% faster than 2 Xeon CPUs with 1M atoms or more (source: http://adsabs.harvard.edu/abs/2016CoPhC.201...95N)

Meanwhile, the CUDA code has continued to scale with the GPU roadmap and a Titan XP is arguably 9-10x faster than 2 Xeon CPUs. No data is supplied at the low-end for Xeon Phi and I think we can safely assume it's because performance there sucks. (source: http://ambermd.org/gpus/benchmarks.htm)

Xeon Phi? IMO avoid avoid avoid until they start winning head to head 3rd party benchmarking fights like Soumith Chintala's fantastic convnet benchmark data: https://github.com/soumith/convnet-benchmarks

3 comments

Yes to all of this. I'm really surprised they didn't compare costs in this blog post. Ignore the DGX-1 row of their table; the really damning comparison is between the 2nd and 4th rows of the table.

With a single 4x GPU server costing around $7k in total (row 4), you get nearly double the performance you get from spending $28k on four Xeon Phi servers (row 2).

And that's assuming you've spent the time and disk replicating your data on all four of those Xeon Phi servers, or went to a likely relatively large amount of engineering effort to ensure that network IO doesn't bottleneck training.

So, a specific MD code may or may not work well with KNL -- we don't have data. KNL looks quite attractive for other chemistry, given all the vector units, large amount of fast memory, and ability to run realistically-sized examples without the network, or potentially the network-on-chip. We'll see how it pans out.
I prefer to look at it the other way, why don't you point out an existing and important chemistry application where KNL bested its contemporary GPUs, say the best of Knight's Corner versus the best of Kepler (K40 or K80). I'm also open to Knight's Landing versus GP100 (vaporware versus no longer vaporware but hard to get)

I'm genuinely interested here because I can't find this anywhere. I don't think it exists personally.

KNL is not Knights Corner, and I have limited information on either. I'm interested in data and, more to the point, insight -- not just single benchmark numbers or specific programs, especially if they've had a lot of GPU effort and no tuning for KNL. I don't expect KNL to be particularly good for applications that aren't highly vectorizable, though the memory system may help.

If I manage to access the KNL here, I'll probably run cp2k and gromacs, though single node performance is of limited interest, and ELPA doesn't currently have AVX512-specific support.

Here's your answer for GROMACS (and it sucks)...

http://www.prace-ri.eu/IMG/pdf/wp120.pdf

Even so, right now, little would please me more technologically than a competitive Xeon Phi offering, but while KNL is better than KNC, my inside info says it sucks too (it would have been a lot more interesting, just like Altera's Stratix 10, if it had shipped before GP100 and GP102).

Right now, I have more confidence in AMD GPUs right now than I have in Xeon Phi. This 3rd party benchmark is particularly interesting (and it doesn't look like anyone at NVIDIA is paying any attention to it):

https://techaltar.com/amd-rx-480-gpu-review/2/

Sure, NVIDIA is still in the lead, but not with the ~10x margins they used to have over AMD.

Finally, I figuratively feel like punching the next person who makes the BS scaling argument over raw performance. GPUs scale too if they're coded correctly. And cloud datacenters are the worst place for that given their craptastic ~10 Gb/s interconnect subject to arbitrary network weather effects.

Or butchering Seymour Cray: Your life depends on winning a race, would you bet your life on a 1,350 HP Venom GT or on 20 179 HP Scion FRSs? I mean collectively that's almost 3600 HP, right? Except it's even worse because for GPUs vs CPUs, it's like they priced the Scion FRS like a Venom GT and vice versa.

I wish you luck finding Xeon Phi winning anything but synthetic tests against yesterday's news:

https://www.xcelerit.com/computing-benchmarks/libor/intel-xe...

Omnipath also sucks compared to Infiniband. How are they making so many inroads into HPC with these offerings? I mean, aside from their dominant-for-good-reason CPUs.
Please share the data, particularly for the built-in interfaces on ~70 cores which are supposed to be available this year (given the blanket statement). Omni-path seems to be worrying Mellanox, judging by a recent visit.