| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Flux159 591 days ago

I like they're adding 4MP images, but after so many models in the past 2 years (diffusion, LLM, etc.), I can't keep up with what model(s) are best for which use cases.

I know civitai has fine tunes specifically for anime style, realistic, etc. but I don't know which one is "state of the art". /r/stablediffusion usually gets overly hyped about new models and isn't really searchable for what is sota today. This doesn't even get into models that are only accessible via api like flux pro or through an app (midjourney).

LLMs have pretty much the same problem for locally runnable models and api based ones (llama vs qwen 2.5 vs sonnet 3.5 for coding vs other tasks).

Does anyone know of a github repo or an app that is keeping these things up to date? Or is that something that other people would also want to collaborate on?

4 comments

doctorpangloss 591 days ago

There are a lot of good image models you can use.

Flux is the best open weights model.

Ideogram, Recraft, Midjourney, Leonardo are all very capable hosted image generators. DALL-E3 was way ahead of its time and is still very good.

RunwayML Gen3 Alpha, Lumina, Hailuo, Kling, Minimax and others do video well.

Sora is probably the best visual media generator but is not widely available to use. Only people at Meta have used Meta’s Chameleon, which is maybe the most capable visual media generator today.

None are particularly good at particular styles or not.

All the content on CivitAI is reflective of the quality of the foundational models. Flux and SD3 community fine tunes are very capable. CivitAI isn’t representative of the best in the community, the state of the art, or even what people are using this stuff for.

link

Filligree 591 days ago

/r/stablediffusion has one user who posts a list of news approximately once per week, which is a good way to keep up to date on new developments, but it's a firehose.

To directly answer your question, though, these are the most useful models right now:

- Stable Diffusion 1.5. Still not great, but better than it used to be; the ecosystem is mature, there's control nets and other things for any possible use-case, and it runs on less horsepower than any other model. Still, you won't get nearly the same quality. Definitely use a fine-tune from civitai; the base model is terrible.

- Stable Diffusion 2: Mostly terrible. Avoid. SDXL is better in every regard.

- SDXL: Kinda the same as 1.5, except better in every regards except system requirements. Still, it will run on almost any modern GPU, and makes megapixel-sized images natively. The base model still isn't great, but there's a lot of finetunes on civitai -- pick one based on desired aesthetic.

- Stable Diffusion 3: Terrible. Avoid.

- Stable Diffusion 3.5: Actually quite good! The system requirements are high, but lower than Flux, and unlike Flux this model isn't distilled. There are two variants, medium and large; medium is tuned for 2-megapixel images, large for 1-megapixel, but large is slightly better in terms of prompt adherence and quality. A common workflow is to use medium for upscaling images that were first created by large. This is also the first model on this list where the base model is perfectly usable, and SD 3.5 understands a lot more styles than anything else you could point at. Which means you always need to specify.

- Flux 1.Dev: A distillation model of Flux 1.Pro, but the latter isn't downloadable. Prompt adherence is better than anything else here, but it basically only understands 'Pixar', 'Photographic' and 'Anime'. If you want a very specific picture, Flux will do better than 3.5, assuming the picture falls in those categories. 3.5 generally makes _prettier_ pictures, though... or more interesting pictures, whichever.

- PonyXL: SDXL architecture, completely new training set. PonyXL is trained on Danbooru-tagged data, and its derivatives are usually the best option if you want anime, assuming that you can't run Flux or SD 3.5. Or if you want something NSFW; 3.5 and Flux are both safety-tuned, though in the case of 3.5 I suspect that will only last another month at most. Some PonyXL fine-tunes give you photorealistic outputs with anime-style tagging. It's the same architecture as SDXL, but you should treat this as a different base model.

Oh, and:

- GenAI Mochi: This is an open-weights video generation model, which you can run in ComfyUI on a 4090 or better. Mostly a novelty, but also quite good actually!

link

loudmax 591 days ago

Thanks for this excellent overview of the current available models.

I'd only note that when you say a model is `terrible`, all of this image generation technology is mind-blowing compared to what you could expect to do on consumer grade hardware just a few years ago. We've come a very long way in a very short time.

link

Filligree 591 days ago

That's absolutely true. I got into this well before Stable Diffusion existed, so I have galleries full of pictures from VQGAN et al, and SD 1.5 was already amazing by comparison.

link

vanillax 591 days ago

I've recently been down this path and its a messy place thats really the most user unfriendly experience. But at a high level you have two staples models. Stable Diffusion 3 XL and FLux.

There are two main tools ( think ollama ) - Automatic1111 ( gradio like UI ) which only works with stable diffusion models... and ComfyUI ( NodeRed like UI ), where comfy ui supports all models but is harder to set up and learn.

link

GaggiX 591 days ago

I think that for open models Flux models are considered the best ones. The interface that is used for these models is usually ComfyUI.

For best model in general, I think that Ideogram 2 and Recraft model are the best one to use, recraft.ai allows you to create styles based on the images you upload and that's very useful as the model is not open.

For anime Novel V3 is still the best one after almost a year, Illustrious for open models.

link