| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aurareturn 294 days ago

  Their prompt processing speeds are absolutely abysmal

They are not. This is Blackwell with Tensor cores. Bandwidth is the problem here.

1 comments

BoorishBears 294 days ago

They're abysmal compared to anything dedicated at any reasonable batch size because of both bandwidth and compute, not sure why you're wording this like it disagrees with what I said.

I've run inference workloads on a GH200 which is an entire H100 attached to an ARM processor and the moment offloading is involved speeds tank to Mac Mini-like speeds, which is similarly mostly a toy when it comes to AI.

link

aurareturn 294 days ago

Again, prompt processing isn't the major problem here. It's bandwidth. 256GB/s bandwidth (maybe ~210 in real world) limits the tokens per second well before prompt processing.

Not entirely sure how your ARM statement matters here. This is unified memory.

link