| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by orbital-decay 1037 days ago

> Out of all the AI-related tools, generative art frontends are probably the thing most likely to radically change and improve in the next few years.

The difference between UIs is actually not very relevant today; by now the generic workflow for complex scenes is more or less obvious to anyone who spent time with SD.

- Draw basic composition guides. Use them with controlnets or any other generic guidance method to enforce the environment composition you want. Train your own controlnet if you need something specific. (lots of untapped potential here)

- Finetune the checkpoint on your reference pictures or use other style transfer methods to enforce the consistent style.

- Use manual brush masking, manually guided segmentation (ex. SAM), or prompted segmentation (ex ClipSEG) to select the parts to be replaced with other objects. The choice depends on your case and need to do it procedurally.

- Photobash and add detail to the elements of your scene using any composition methods you have (noisy latent composition, inpainting etc) with the masks you created in the previous step. Use advanced guidance (controlnets, t2i adapters etc)

- Don't bother with any prompts beyond very basic descriptions, as "prompt engineering" is slow and unreliable. Don't overwhelm the model by trying to fit lots of detail in one pass; use separate passes for separate objects or regions.

- Alternative 3D version: build a primitive 3D scene from basic props (shapes, rigs). Render the backdrop and separate objects into separate layers as guides. Use them with controlnets & co to render the scene in a guided manner, combining the objects by latent composition, inpainting, or any other means. This can be used for procedural scenes and animation (although current models lack temporal stability).

As long as your tool has all that in one place, it's a breeze, regardless of the UI paradigm (admittedly auto1111's overloaded gradio looks straight out of a trash compactor nowadays). I expect 2D/3D software integrations being the most successful in the future, as they already offer proven UIs and most desirable side features. The problem is that in the current state SD can't do much in the production setting, it's not a finished product - so there's not a lot of interest in software integrations just yet.

2 comments

logicallee 1037 days ago

Thanks for sharing this detailed guide. Can you share an example of the type of resulting image you’ve generated using the above approach?

I’ve only just used Dall-E or SD with basic prompts, or sometimes using photoshop afterward. I’m curious what you’ve been able to come up with using your more complex pipeline.

link

kadokaelan 1037 days ago

vizcom.ai ;)

link

samstave 1036 days ago

Wow that is awesom... I'd kill my $30/mo sub to midjourney if this thing were $30/mo for individuals...

link