| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tetrazine 2812 days ago

One important niche where Intel is ripe to lose their lead is deep learning workstations. The issue at hand: they don't put high enough PCIe lane counts on all but the highest end desktop processors. It's unclear if these processors will address that (I can't find lane counts on a quick google search).

Let's say you want to run 4 GPUs off your CPU for a powerful workstation or small server (e.g. for a research group). There's debate over whether you need to run cards in x8 or x16 mode for most deep learning applications, so let's say 8x as a conservative choice. That means you need 32 lanes (or 16 for just two cards). But your drive and other peripherals might take some lanes too. So 40 looks like a safer number. Easy enough to find that on a motherboard...

Most of the mid-high range Intel desktop CPUs only have 16 lanes total. Base model Threadripper (1900x) which is price-competitive with those has 64. You can go to Xeon but that (AFAIK) can be problematic in several ways for midrange or mixed use workstations (no integrated graphics, less mobo selection for the needed features).

I think this is pretty important. If enough researchers go to Ryzen then math libraries will get written there and the lead that libs like MKL provide could be nullified[0]. This will filter into the server market, where CPU is used even in deep learning production deployments (e.g. for inference servers). And having lower end processors is important because there are lots of independent researchers doing important work in the field who don't have huge budgets, as well as academics who can't get budget and might be spending out of pocket on their workstation.

[0] I'll admit I don't know enough about the math hardware in Ryzen vs Intel to say if this is possible.

3 comments

brigade 2812 days ago

Threadripper CPUs might be price competitive with LGA1151 CPUs, but X399 motherboards start at more than the lower-end CPUs; combined you really should be comparing to Intel's HEDT X299 platform, which starts at about the same overall price for motherboard+CPU. And Intel's HEDT chips all have 44 lanes this year thanks to Threadripper.

More importantly for deep learning (this price difference is lost in the cost of GPUs), AFAIK all X399 motherboards for Threadripper are set up as 16x/8x/16x/8x, whereas there are X299 motherboards with PCIe switches that give you 16x/16x/16x/16x with neighboring GPUs sharing bandwidth. For deep learning, each individual GPU having 16 lanes of bandwidth is more important than having to share with another GPU.

link

tetrazine 2812 days ago

I think you're right about this for many people but not all.

Intel stepping up the lane counts in HEDT is definitely a good reactionary move but won't affect the budget-constrained scenarios I discussed until a few years from now because you can just buy a couple generations back. I might be wrong but I believe many X299 chips before gen 9 had 16-24 lanes?

I also don't think your point about price difference insignificance is universally applicable because buying extra GPUs over time is quite common and having an extensible box with 1-2 GPUs at the start is potentially a good move. As for the PCIe switches on the X399 chipset IMO this depends on the assertion that 16x is X% better than 8x. This depends on use case and analysis of the penalty you get at 8x vary but many people would take a 10-15% penalty down the road @ 4 cards (really 5-7.5% because it only affects 2/4 cards) to save money now.

I built a box with Intel because I had the capital, and most people at an industry job probably should, but if you read forum and mailing list threads many people are faced with this economic decision and are going TR - I see quite a few academics doing this. Math libraries are a good thesis topic :)

link

YetAnotherNick 2812 days ago

What percentage of Intel users do care about running deep learning system. And of those users, how many care it as little to not get a GPU for that. Both Intel and AMD processors can never come close to GPUs for deep learning operations like matrix-vector multiplication.

link

timc3 2812 days ago

well in the article it does say 24 pcie lanes. Which is a pity that its not so much (I would also like to see how much memory these chipsets/CPUs could support).

link