Hacker News new | ask | show | jobs
by ttul 848 days ago
MPS is promising and the memory bandwidth is definitely there, but stable diffusion performance on Apple Silicon remains terribly poor compared with consumer Nvidia cards (in my humble opinion). Perhaps this is partly because so many bits of the SD ecosystem are tied to Nvidia primitives.
2 comments

Image diffusion models tend to have relatively low memory requirements compared to LLMs (and don’t benefit from batching), so having access to 128 GB of unified memory is kinda pointless.
They do benefit from batching; up to a 50% performance improvement, in my experience.

That might seem small compared to LLMs, but it isn't small in absolute terms.

I got a 2x jump on my 4090 from batching SDXL.
Stable diffusion will run fine on a 3090, or 4070ti Super and higher.