| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mrtksn 1298 days ago

With the full 50 iterations it appears to be about 30s on M1.

They have some benchmarks on the github repo: https://github.com/apple/ml-stable-diffusion

For reference, previously I was getting about <3 minutes for 50 iterations on my Macbook Air M1. I haven't yet tried Apple's implementation but it looks like a huge improvement. It might take it from "possible" to "usable".

4 comments

liuliu 1298 days ago

Yeah, it is just PyTorch MPS backend is not fully baked and have some slowness. You should be able to get close to that number with maple-diffusion (probably 10% slower) or my app: https://drawthings.ai/ (probably around 20% slower, but it supports samplers that takes less steps (50 -> 30)).

link

washadjeffmad 1298 days ago

For comparison, it's also taking ~3min @ 50 iterations on my 12c Threadripper using OpenVino. It sounds like the improvements bring the M1 performance roughly in line with a GTX 1080.

link

joakleaf 1298 days ago

The Apple Neural Engine in the m1 is supposed to be able to perform 11 tops. The GTX 1080 about 9-11 tflops.

So sounds plausible that the m1 can reach the same level in some use cases with the right optimizations.

link

mrtksn 1298 days ago

I have Macbook Air M1, which is passively cooled. When cooled properly, that is thermal pad mod combined with a fan under the laptop, I'm getting closer to 2min - something like 2.8s per iteration. I guess it would be something 140s for 50 iterations on a MacBook Pro or Mac mini for M1.

link

desro 1298 days ago

This is accurate re: M1 Mac Mini times IME

link

fswd 1298 days ago

Not SD2.0 but SD1.5, I am getting 30 iterations in 10 seconds on 1080ti. 50 iterations 18 seconds. 100%|| 30/30 [00:10<00:00, 2.84it/s]

link

jerpint 1298 days ago

How do dreamstudio/craiyon/hugging face manage to do seemingly quicker on their interfaces? Are they hosting these models on super beefy and costly GPUs for free?

link

modeless 1298 days ago

M1's single-threaded CPU performance and power efficiency are exceptional; however M1's GPU performance is nothing special compared to normal discrete GPUs. You don't need something super beefy to beat M1 on the GPU side.

But also yes, it's gotta be expensive to host these models and I'm not sure where all these subsidies are coming from. I expect that we'll eventually see these things transition to more paid services.

link

microtonal 1298 days ago

For a low-power SoC, the GPU performance is actually pretty impressive. We recently did some transformer benchmarks and the inference performance of the M1 Max is almost half that of an RTX3090:

https://explosion.ai/blog/metal-performance-shaders

However the SoC only uses 31W when posting that performance.

link

Terretta 1298 days ago

Haven't tried this yet, but sounds slower than SD itself if you use one of the alt builds that supports mps where it had been cuda.

Mac Studio with M1 Ultra gets 3.3 iters/sec for me.

MacBook Pro M1 Max gets 2.8 iters/sec for me.

link

dagmx 1298 days ago

You’re talking about the higher end SKUs with many more GPU cores though and significantly more RAM (I think the lowest you can get is 32GB vs the 8 on their chip)

link