Hacker News new | ask | show | jobs
by aurareturn 294 days ago

  Their prompt processing speeds are absolutely abysmal
They are not. This is Blackwell with Tensor cores. Bandwidth is the problem here.
1 comments

They're abysmal compared to anything dedicated at any reasonable batch size because of both bandwidth and compute, not sure why you're wording this like it disagrees with what I said.

I've run inference workloads on a GH200 which is an entire H100 attached to an ARM processor and the moment offloading is involved speeds tank to Mac Mini-like speeds, which is similarly mostly a toy when it comes to AI.

Again, prompt processing isn't the major problem here. It's bandwidth. 256GB/s bandwidth (maybe ~210 in real world) limits the tokens per second well before prompt processing.

Not entirely sure how your ARM statement matters here. This is unified memory.