Hacker News new | ask | show | jobs
How we got Stable Diffusion XL inference to under 2 seconds (baseten.co)
51 points by varunshenoy 1022 days ago
4 comments

Playing around with cfg technique, I'm finding that turning off guidance at the 40% mark causes requested fine details to not appear in the final image. This sorta implies that switching cfg midway and/or switching prompt vectors might be interesting from a prompting standpoint, but it kinda kills it as a performance optimization.
https://imgur.com/a/47D6MEl demonstrates, prompt is “Praying Mantis, looking out a living room window, (hyperrealism:1.2), (8K UHD:1.2), (photorealistic:1.2), shot with Canon EOS 5D Mark IV, detailed bug, macro” with 50 steps, cutting off cfg at 20 steps vs using cfg all the way through in the normal fashon.
It's a bit weird to talk about steps but not about the sampler (20 steps with Euler vs 20 steps in DPM+2M Karras are pretty different beasts both in terms of speed and quality).

I also see compiling but no AITemplate, which seems to be the among the hottest way to speed-up SD recently.

This could save alot of money on Replicate.ai

Especially if you are charging your users the same 1,000% markup while your own costs have been cut into 1/3rd and deliver results faster

I don’t know man, out of the box on SD-Next it’s about 3-4 secs for a picture at 1024 with UniPC and 20 steps on a 4090