| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dahart 920 days ago

> Intel is releasing a 288-core x86

This made me wonder a couple of things-

What kind of workloads and problems is that best suited for? It’s a lot of cores for a CPU, but for pure math/compute, like with AI training and inference and with graphics, 288 cores is like ~1.5% of the number of threads of a modern GPU, right? Doesn’t it take particular kinds of problems to make a 288 core CPU attractive?

I also wondered if the ratio of the highest core count CPU to GPU has been relatively flat for a while? Which way is it trending- which of CPUs or GPUs are getting more cores faster?

1 comments

imtringued 920 days ago

You could do sparse deep learning with much, much larger models with these CPUs. As paradoxical as it might sound, sparse deep learning gets more compute bound as you add more cores.

link

why_only_15 920 days ago

I'd be curious to learn more about how it's compute bound and what specifically is compute bound. On modern H100s you need ~600 fp8 operations per byte loaded from memory in order to be compute bound, and that's with full 128-byte loads each time. Even integer/fp32 vector operations need quite a few operations to be compute bound (~20 for vector fp32).

link

imtringued 920 days ago

I think you misunderstood what I mean. Sparse ML is inherently memory latency bound since you have a completely unpredictable access pattern prone to cache misses. The amount of compute you perform is a tiny blip compared to the hash map operations you perform. What I mean is that as you add more cores, there are sharing effects because multiple cores are accessing the same memory location at the same time. The compute bound sections of your code become a much greater percentage of the overall runtime as you add cores, which is surprising, since adding more compute is the easy part. Pay attention to my words "_more_ compute bound".

Here is a relevant article: https://www.kdnuggets.com/2020/03/deep-learning-breakthrough...

link