The integrated GPU. Not enough compute onboard to handle prefill for 100gb+ models, and the decode is constrained by memory bandwidth that's lower than most dGPUs that price.
Apple would be in a much stronger spot right now if they didn't pretend like eGPUs were inconceivable black magic that Macs are incompatible with.
I'm not sure I follow - 614 GB/sec is pretty squarely in dGPU territory (~5070 level). External GPUs can definitely exceed that on the very high end, but it seems pretty competitive, no?
Competitive for 16-24GB dGPUs, but for 100gb+ inference workloads it's going to be a decode bottleneck. For smaller models it'd be fine, but the same goes for the smaller GPUs.
In particular though, the fatal bottleneck is the weakness of the iGPU. Filling a KV cache on a 100gb+ model could take a few minutes, or even hours if you're trying to restore a 256k-to-1m token session.
Apple would be in a much stronger spot right now if they didn't pretend like eGPUs were inconceivable black magic that Macs are incompatible with.