Hacker News new | ask | show | jobs
by imranhou 967 days ago
The versatility of Stable Diffusion, especially when combined with tools like ControlNet, highlights the advantages of a more controlled image generation process. While DALL-E and others provide ease and speed, the depth of customization and local processing capabilities of SD models cater to those seeking deeper creative control and independence.
1 comments

It is interesting isn't it? Because we have "AI" generating the image, but we still seem to want to "paint" or have control over the creative process.

Prompts seem to be a new type of camera, lens or paintbrush.

There's at least three "levels" you can consider with image generation: composition, facial likeness and style. Prompts are pretty weak at composition and are the strongest point of controlnets - they do a great deal to make up for the weakness. But there are some compositions SD can't find even when given detailed controlnets.

Style generality is frequently lost in fine-tuned models. The original dreambooth tried to get around this by generating lots of images of the class to retain generality, but it's time intensive to generate all the extra images (and ideally do some QC on them) and train on them too, so it's not often done.