| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Filligree 591 days ago

/r/stablediffusion has one user who posts a list of news approximately once per week, which is a good way to keep up to date on new developments, but it's a firehose.

To directly answer your question, though, these are the most useful models right now:

- Stable Diffusion 1.5. Still not great, but better than it used to be; the ecosystem is mature, there's control nets and other things for any possible use-case, and it runs on less horsepower than any other model. Still, you won't get nearly the same quality. Definitely use a fine-tune from civitai; the base model is terrible.

- Stable Diffusion 2: Mostly terrible. Avoid. SDXL is better in every regard.

- SDXL: Kinda the same as 1.5, except better in every regards except system requirements. Still, it will run on almost any modern GPU, and makes megapixel-sized images natively. The base model still isn't great, but there's a lot of finetunes on civitai -- pick one based on desired aesthetic.

- Stable Diffusion 3: Terrible. Avoid.

- Stable Diffusion 3.5: Actually quite good! The system requirements are high, but lower than Flux, and unlike Flux this model isn't distilled. There are two variants, medium and large; medium is tuned for 2-megapixel images, large for 1-megapixel, but large is slightly better in terms of prompt adherence and quality. A common workflow is to use medium for upscaling images that were first created by large. This is also the first model on this list where the base model is perfectly usable, and SD 3.5 understands a lot more styles than anything else you could point at. Which means you always need to specify.

- Flux 1.Dev: A distillation model of Flux 1.Pro, but the latter isn't downloadable. Prompt adherence is better than anything else here, but it basically only understands 'Pixar', 'Photographic' and 'Anime'. If you want a very specific picture, Flux will do better than 3.5, assuming the picture falls in those categories. 3.5 generally makes _prettier_ pictures, though... or more interesting pictures, whichever.

- PonyXL: SDXL architecture, completely new training set. PonyXL is trained on Danbooru-tagged data, and its derivatives are usually the best option if you want anime, assuming that you can't run Flux or SD 3.5. Or if you want something NSFW; 3.5 and Flux are both safety-tuned, though in the case of 3.5 I suspect that will only last another month at most. Some PonyXL fine-tunes give you photorealistic outputs with anime-style tagging. It's the same architecture as SDXL, but you should treat this as a different base model.

Oh, and:

- GenAI Mochi: This is an open-weights video generation model, which you can run in ComfyUI on a 4090 or better. Mostly a novelty, but also quite good actually!

1 comments

loudmax 591 days ago

Thanks for this excellent overview of the current available models.

I'd only note that when you say a model is `terrible`, all of this image generation technology is mind-blowing compared to what you could expect to do on consumer grade hardware just a few years ago. We've come a very long way in a very short time.

link

Filligree 591 days ago

That's absolutely true. I got into this well before Stable Diffusion existed, so I have galleries full of pictures from VQGAN et al, and SD 1.5 was already amazing by comparison.

link