| HN Mirror

Latent consistency models are a pretty radical game changer that came up recently. There are LoRAs [0] that you can just use alongside any SD or SDXL that just cut the number of inference steps you need to 2-8, rather than the usual ~25+. It's as close to magic as one could expect, and on ComfyUI my modest RX 5700XT spits out 512x512 images in probably around a second each, or a couple of seconds for a 4x batch. A more beefy GPU could certainly enable high res, very low latency interactive use.

For even better latency perception, you could hook into the generation steps and have TAESD [1] decoding intermediate latents.

[0] https://huggingface.co/collections/latent-consistency/latent... [1] https://github.com/madebyollin/taesd