| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by haolu7 1361 days ago
	AITemplate-PyTorch Stable Diffusion is the fastest stable diffusion inference solution by pushing image generation below one second on A100 (batch 1: 0.7s / 25 steps, 1.3s / 50 steps; batch 3: 1.6s / 25 steps, per image 0.55s; batch 16 7.9s / 25 steps, per image 0.49s) for the first time, 2.57X faster than Keras' XLA-based GPU compilation solution. More benchmark numbers and repro at: https://github.com/facebookincubator/AITemplate/tree/main/ex...

5 comments

Llamamoe 1361 days ago

Wow. Considering that with the better samplers you can reduce steps to 10-15, this is getting close to near-instant results.

One or two more optimizations and we're gonna have live-update results.

link

tveita 1360 days ago

This lists "OOM" for PyTorch on a RTX 3080-10GB, but I believe people have optimized the PyTorch SD model to run on even 6GiB GPUs.

Would AITemplate be able to run with those constraints?

link

ipiszy 1360 days ago

RTX 3080-10GB should work. You could check https://github.com/facebookincubator/AITemplate/tree/main/ex..., and https://www.reddit.com/r/StableDiffusion/comments/xv7m89/met....

link

PresentHarmony 1360 days ago

Or if you count in another way. In one second, how many pictures it will be able to generate, with these parameters. It could be 1.05, 1.1, or say 1.5 or even 2 pictures. Thank you very much for your post! I will be very grateful for the answer!

link

PresentHarmony 1360 days ago

Can you please eloborate, how many milliseconds does it take to generate 1 image with these wonderful improvements? I will be very grateful for your answer! Thank you very much!

link

PresentHarmony 1360 days ago

Do I get it right, it takes 0.55 second or 0.49 second to generate an image depending on the batch?

Thank you so much for your post! I would be very grateful for the response!

link

ipiszy 1360 days ago

Yes this is correct. batch 16 7.9s / 25 steps, per image 0.49s: it generates 16 images for each prompt within 7.9s, so it's 0.49s per image.

link

PresentHarmony 1360 days ago

One more question, if you don't mind. 1 image is generated in 0.7 seconds (25 steps ) and the same single image with 50 steps will be generated in 1.3 seconds. So it's much cheaper to generate more images for the same promt. Am I right or am I missing something ? Thanks in advance for your answer.

P.S. Though it should be 1.4 seconds. 0.7*2=14.If you think twice the speps, twice the time.

link

PresentHarmony 1360 days ago

Thank you indeed, my friend!

link