|
|
|
|
|
by tetrazine
2812 days ago
|
|
One important niche where Intel is ripe to lose their lead is deep learning workstations. The issue at hand: they don't put high enough PCIe lane counts on all but the highest end desktop processors. It's unclear if these processors will address that (I can't find lane counts on a quick google search). Let's say you want to run 4 GPUs off your CPU for a powerful workstation or small server (e.g. for a research group). There's debate over whether you need to run cards in x8 or x16 mode for most deep learning applications, so let's say 8x as a conservative choice. That means you need 32 lanes (or 16 for just two cards). But your drive and other peripherals might take some lanes too. So 40 looks like a safer number. Easy enough to find that on a motherboard... Most of the mid-high range Intel desktop CPUs only have 16 lanes total. Base model Threadripper (1900x) which is price-competitive with those has 64. You can go to Xeon but that (AFAIK) can be problematic in several ways for midrange or mixed use workstations (no integrated graphics, less mobo selection for the needed features). I think this is pretty important. If enough researchers go to Ryzen then math libraries will get written there and the lead that libs like MKL provide could be nullified[0]. This will filter into the server market, where CPU is used even in deep learning production deployments (e.g. for inference servers). And having lower end processors is important because there are lots of independent researchers doing important work in the field who don't have huge budgets, as well as academics who can't get budget and might be spending out of pocket on their workstation. [0] I'll admit I don't know enough about the math hardware in Ryzen vs Intel to say if this is possible. |
|
More importantly for deep learning (this price difference is lost in the cost of GPUs), AFAIK all X399 motherboards for Threadripper are set up as 16x/8x/16x/8x, whereas there are X299 motherboards with PCIe switches that give you 16x/16x/16x/16x with neighboring GPUs sharing bandwidth. For deep learning, each individual GPU having 16 lanes of bandwidth is more important than having to share with another GPU.