Hacker News new | ask | show | jobs
by ac29 917 days ago
While the first >1000 core x86 processor is probably a little ways out, Intel is releasing a 288-core x86 processor in the first half of 2024 (Sierra Forest). I assume AMD will have something similarly high core in 2024-25 as well.
3 comments

To be clear, you can probably make a 1000 core x86 machine, and those 1000 cores can probably even be pretty powerful. I don't doubt that. I think Azure even has crazy 8-socket multi-sled systems doing hundreds of cores, today. But this thread is about CUDA. Sierra Forest will get absolutely obliterated by a single A100 in basically any workload where you could reasonably choose between the two as options. I'm not saying they can't exist. Just that they will be (very) bad in this specific competition. I made an edit to my comment to reflect that.

But what you mention is important, and also a reason for the ultimate demise of e.g. Xeon Phi. Intel surely realized they could just scale their existing Xeon designs up-and-out further than expected. Like from a product/SKU standpoint, what is the point of having a 300 core Phi where every core is slow as shit, when you have a 100 core 4-socket Xeon design on the horizon, using an existing battle-tested design that you ship billions of dollars worth every year? Especially when the 300 core Xeon fails completely against the competition. By the time Phi died, they were already doing 100-cores-per-socket systems. They essentially realized any market they could have had would be served better by the existing Xeon line and by playing to their existing strengths.

> Intel is releasing a 288-core x86

This made me wonder a couple of things-

What kind of workloads and problems is that best suited for? It’s a lot of cores for a CPU, but for pure math/compute, like with AI training and inference and with graphics, 288 cores is like ~1.5% of the number of threads of a modern GPU, right? Doesn’t it take particular kinds of problems to make a 288 core CPU attractive?

I also wondered if the ratio of the highest core count CPU to GPU has been relatively flat for a while? Which way is it trending- which of CPUs or GPUs are getting more cores faster?

You could do sparse deep learning with much, much larger models with these CPUs. As paradoxical as it might sound, sparse deep learning gets more compute bound as you add more cores.
I'd be curious to learn more about how it's compute bound and what specifically is compute bound. On modern H100s you need ~600 fp8 operations per byte loaded from memory in order to be compute bound, and that's with full 128-byte loads each time. Even integer/fp32 vector operations need quite a few operations to be compute bound (~20 for vector fp32).
I think you misunderstood what I mean. Sparse ML is inherently memory latency bound since you have a completely unpredictable access pattern prone to cache misses. The amount of compute you perform is a tiny blip compared to the hash map operations you perform. What I mean is that as you add more cores, there are sharing effects because multiple cores are accessing the same memory location at the same time. The compute bound sections of your code become a much greater percentage of the overall runtime as you add cores, which is surprising, since adding more compute is the easy part. Pay attention to my words "_more_ compute bound".

Here is a relevant article: https://www.kdnuggets.com/2020/03/deep-learning-breakthrough...

288 Cores or Threads? Cuz to my knowledge AMD already has a 128 Core, 256 Thread Processor with the Epyc 9754