The fastest implementation on my 2060 laptop is AITemplate, being about 2x faster than pure optimized HF diffusers.