Depends what fork you're running... Some seem to be using CPU-based generation, others use the MPS device backend correctly which is MUCH faster. I have another comment floating around about lstein's fork, but it takes some massaging to get it to run happily. https://github.com/lstein/stable-diffusion/
EDIT: Speed increased to 2.3s/iter after a reboot