They list under hardware requirements "a powerful graphics card with at least 6 GB VRAM is recommended. Otherwise generating images will take very long"
Does anyone have any idea what would very long mean on a 4GB VRAM card?
"Tested on a NVIDIA GeForce RTX 3050 under Ubuntu with 4GB VRAM. (...) lowered the canvas to 2Kx2K and it seems to just about be okay. My test prompt (...) produces a picture of rocks. (...) I get a nice scene (...) Both take about two minutes."
My very-rough feeling about it from playing around with Stable Diffusion is that it takes about 4x as long if it runs out of GPU memory and needs to shuttle data back and forth from system memory. There are a lot of variables though - on my 3070 with 8GB of RAM, I can get very impressive 512x512 images in about 10 seconds with somewhat low sample counts, or I can set it to a higher resolution and sample count with 2x upscaling and get a really sharp image in around 2 minutes.
"Tested on a NVIDIA GeForce RTX 3050 under Ubuntu with 4GB VRAM. (...) lowered the canvas to 2Kx2K and it seems to just about be okay. My test prompt (...) produces a picture of rocks. (...) I get a nice scene (...) Both take about two minutes."