|
|
|
|
|
by BoorishBears
294 days ago
|
|
They're abysmal compared to anything dedicated at any reasonable batch size because of both bandwidth and compute, not sure why you're wording this like it disagrees with what I said. I've run inference workloads on a GH200 which is an entire H100 attached to an ARM processor and the moment offloading is involved speeds tank to Mac Mini-like speeds, which is similarly mostly a toy when it comes to AI. |
|
Not entirely sure how your ARM statement matters here. This is unified memory.