Opendream: A layer-based UI for Stable Diffusion

112233 1036 days ago

Gimp is so well established that it has almost fossilized...

Also, "Normal" layered non-destructive operations are a couple of orders of magnitude faster and do not require 8Gb of VRAM per 512x512 patch, or work only with fixed set of buffer sizes, or any of other strange things SD comes with. Like, how a non-destructive controlnet layer would look in Gimp?

j-a-a-p 1036 days ago

Depends on where the 'opposite approach' is aimed to end. If the result is a totally new creative workflow then what is the point of carrying all the ballast of a legacy tool?

_6atf 1036 days ago

I got briefly very excited for non-destructive editing in GIMP, but the website still says this is slated for 3.2. Which functionality were you referring to?

tavavex 1037 days ago

Very exciting. The "first-generation" Stable Diffusion frontends seem to have settled on a specific design philosophy, so it's interesting to see new tools (like this or ComfyUI) shake up the way people work with this tool. I hope that in a few years, we'll know which philosophy works best.

TillE 1037 days ago

Out of all the AI-related tools, generative art frontends are probably the thing most likely to radically change and improve in the next few years.

It's specifically why I've avoided diving too deep into "prompt engineering", because the kind of incantations required today just aren't going to be the way most people interact with this stuff for very long.

orbital-decay 1037 days ago

> Out of all the AI-related tools, generative art frontends are probably the thing most likely to radically change and improve in the next few years.

The difference between UIs is actually not very relevant today; by now the generic workflow for complex scenes is more or less obvious to anyone who spent time with SD.

- Draw basic composition guides. Use them with controlnets or any other generic guidance method to enforce the environment composition you want. Train your own controlnet if you need something specific. (lots of untapped potential here)

- Finetune the checkpoint on your reference pictures or use other style transfer methods to enforce the consistent style.

- Use manual brush masking, manually guided segmentation (ex. SAM), or prompted segmentation (ex ClipSEG) to select the parts to be replaced with other objects. The choice depends on your case and need to do it procedurally.

- Photobash and add detail to the elements of your scene using any composition methods you have (noisy latent composition, inpainting etc) with the masks you created in the previous step. Use advanced guidance (controlnets, t2i adapters etc)

- Don't bother with any prompts beyond very basic descriptions, as "prompt engineering" is slow and unreliable. Don't overwhelm the model by trying to fit lots of detail in one pass; use separate passes for separate objects or regions.

- Alternative 3D version: build a primitive 3D scene from basic props (shapes, rigs). Render the backdrop and separate objects into separate layers as guides. Use them with controlnets & co to render the scene in a guided manner, combining the objects by latent composition, inpainting, or any other means. This can be used for procedural scenes and animation (although current models lack temporal stability).

As long as your tool has all that in one place, it's a breeze, regardless of the UI paradigm (admittedly auto1111's overloaded gradio looks straight out of a trash compactor nowadays). I expect 2D/3D software integrations being the most successful in the future, as they already offer proven UIs and most desirable side features. The problem is that in the current state SD can't do much in the production setting, it's not a finished product - so there's not a lot of interest in software integrations just yet.

logicallee 1036 days ago

Thanks for sharing this detailed guide. Can you share an example of the type of resulting image you’ve generated using the above approach?

I’ve only just used Dall-E or SD with basic prompts, or sometimes using photoshop afterward. I’m curious what you’ve been able to come up with using your more complex pipeline.

kadokaelan 1036 days ago

vizcom.ai ;)

samstave 1036 days ago

Wow that is awesom... I'd kill my $30/mo sub to midjourney if this thing were $30/mo for individuals...

chefandy 1036 days ago

As a commercial artist that's worked in several professional creative industries, I find the current textual methods of interacting with generative image AI to be unusable for the vast majority of professional tasks. I think they're great for a lot of laypeople because they abstract away things that laypeople don't want to have to think about— but in professional workflows, you need specificity at pixel-level granularity, predictability, and repeatability. Those things are all difficult with purpose-built tools and impossible through text prompts. I haven't spoken to a single colleague that doesn't work in high-volume, low-effort end of their disks/fields that disagree. Most commercial artists selling point is deciding exactly what should go into a piece, and implementing it is the easy part.

The pro tools that have incorporated generative AI into their workflows are not at all textual. The environment that popularizes this among the general public will look a lot more like canva or maybe Instagram than what's popular now.

fnordpiglet 1036 days ago

At some level I agree that the prompt engineering done today to break ChatGPT guard rails are things that barely rise to “interesting hack” levels, but I think that manipulating language to induce specific behavior by an LLM is a powerful skill, and requires a very facile understanding of language in the semantic context of the training corpus. By varying the tone, vocabulary, style, pacing, and obviously the semantics of the original inducing language you can dramatically change the behavior of the LLM. This is less about prompt engineering and being a masterful manipulator of language - and why I don’t fear that LLMs make language skill irrelevant. Those with the most language skill will produce the most compelling and tailored LLM output for a purpose.

greggsy 1037 days ago

It’s entirely likely that there’s much more effort going into generative text - any perceived advancement of generative images is going to be disproportionately skewed due the richness of information that they hold.

bobboies 1037 days ago

Incantations are fun!

fassssst 1036 days ago

Photoshop Beta does it best. The generative features are just new tools that work as you’d expect with all the existing tools. For example, if you want to do outpainting, just make your canvas bigger and you get a contextual menu where you can (optionally) type a prompt. Inpainting, just make a selection however you want and type a prompt.

DrSiemer 1036 days ago

The control that offers is extremely limited versus SD in A1111 with all it's different models, LoRA's, embeddings, extensions and ControlNet types.

I wrote a typescript API generator for ComfyUI, works great - hopefully will have time to release it soon.

I think there's so much unexplored potential in UI and workflows around generative AI, we've barely scratched the surface. Very exciting times ahead!

ssalka 1037 days ago

I bet this will be available as an Automatic1111 extension by end of month.

tavavex 1036 days ago

I'm doubtful about that. A1111 is what I called a "first-generation frontend". Both it and all of its extensions follow a specific model for its usage - in general, every tool is contained on its own tab, with each tab having buttons to transfer the outputs into other tools. Radically changing this model would require rewriting so much that it'd just make sense to use a different frontend in the first place.

tomalaci 1037 days ago

I haven't followed diffusion image generation development for a while. Where do you find information on what models you can use in the model_ckpt field? Do I need to import them from somewhere? What are the main differences between them and which are more modern or better?

smusamashah 1036 days ago

Civitai.com is the current most popular resource for models. Also ckpt format is discourage for security concerns and saftensors is now used instead.

nickstinemates 1037 days ago

You can find them on huggingface, or you can reverse engineer which ckpt you want to use based on an image you've seen generated (like at majin[1] - beware, there's a lot of NSFW/controversial stuff here.)

1: https://majinai.art/

CSSer 1037 days ago

Some of this is straight up soft-core child porn. This is fucked up.

jay-barronville 1036 days ago

Agreed! I just clicked the link and did a double take. I don't care if it's AI. This is child porn material, and in my opinion, it should be shut down.

samstave 1036 days ago

Yep.

There needs to be a REALLY FUCKING STRONG effort to kill all CP AI anything. Full stop.

AI should automagically report any attempt at CP.

idiotsecant 1036 days ago

Serious question - Why? Assuming no actual CP was used in the training of the model who is being harmed? Ickiness should not imply illegality unless the ickiness is at the expense of someone else. Swing your fist as much as you want so long as you avoid my nose, and all.

_6atf 1036 days ago

> Serious question - Why? Assuming no actual CP was used in the training of the model who is being harmed

I don't think an AI model could generate realistic CP without being trained on examples, which would mean there is literally no way for this assumption to be true.

obvious_thrwawy 1036 days ago

Should they check the IDs of the models to verify?

Imagine getting reported because you generated an image of an anime girl deemed to be only 17.

I'd personally rather live in a world where people generate distasteful images with an AI and have that AI unconstrained than the inevitable one where everything gets locked down and run by large corporations who will ultimately create more harm than someone generating some lolicon.

colechristensen 1036 days ago

If you’re running a service you should have automatic filtering, detection, and such.

A model by itself though… you might as well ask a pencil to report someone for drawing graffiti. It does not make sense.

eVeechu7 1036 days ago

Models have been trained on something though. They are not analogous to pencils or brushes.

nomel 1036 days ago

I've always assumed this is what will be used to justify the regulation of AI.

SV_BubbleTime 1036 days ago

The four horsemen of internet censorship.

Money laundering, CSAM, terrorism, drugs.

greggsy 1037 days ago

I believe illustrations have been deemed to be abuse material, so I wouldn’t be surprised if LE have started looking into it.

kleiba 1037 days ago

Who exactly is being abused here?

I for one would much rather give pedophiles an opportunity to fulfil their sexual desires through AI-generated pictures than real ones.

Of course, we can talk about the training material. Are there actual child porn images in there? I seriously doubt it but who knows?

And perhaps a case could be made that AI-generated child porn could be a gateway to invite people who then seek out non-generated material.

But I think these are separate discussions to be had.

gochi 1037 days ago

They aren't separate discussions, they're directly tied to determining abuse material. Revenge porn is an example of abusive material, despite the subject not being abused in the material usually, they're considered abusive material due to the intent to cause abuse through distribution.

So if either case applies, whether it's training based on certain images, or it becomes a gateway, these are discussions to be had directly relating to whether or not it should be classified as abuse material.

Additionally, I'm not sure if the recommended help methods by professionals who deal with pedophiles is to let them fulfill their specific fantasies without a care.

There are lots of really important discussions to be had, but they're all tied to each other basically. We can't separate them out, nor should we aim to.

CSSer 1037 days ago

Geez that’s disturbing. I clicked having no qualms with nudes, artistic or otherwise. I’m not a prude. I’ve seen my fair share of anime girls and AI nudes. Hell, I was raised on the internet before parental settings were a thing, but I didn’t expect that. It’s so gross how it toes a line too.

dingnuts 1037 days ago

the Fediverse has a big problem with this, too, and I never hear anyone talking seriously about it

GaggiX 1037 days ago

illustrations are not a problem under the law in the United States, but it has to be seen for generated images indistinguishable from reality or almost.

jvanderbot 1036 days ago

What can someone do about it?

Also CivitAI but beware the NSFW

https://civitai.com/

Zuiii 1036 days ago

> https://majinai.art/

Thank you so much for sharing this. Civitai keeps bugging me to create an account. This doesn't seem to suffer from the same flaw.

lopatin 1036 days ago

Controversial is one way to put it

Zuiii 1036 days ago

It always amuses me when people who think they're the center of the world discover that there are other people with moral takes different than theirs.

If no real children were harmed to produce this stuff than it should be treated like any other extreme works of fiction (e.g. violence in video games, graphical descriptions in certain books).

Being disgusted is not grounds for banning something lol.

orbital-decay 1037 days ago

>Where do you find information on what models you can use in the model_ckpt field? Do I need to import them from somewhere?

You can train (finetune) your own on your reference material.

cwkoss 1037 days ago

Very cool. Would be interesting to train a model on images with alpha channels so outputs would be automatically masked and more easily composable. But maybe masking is so good these days that would be futile?

When a user does img-2-img on a layer does it use the context from other visible layers in the generation?

https://multidiffusion.github.io/

dheera 1037 days ago

For composing this approach works pretty well, maybe the author should consider making a UI for it

mottiden 1037 days ago

Thanks for posting. Really interesting

Zetobal 1037 days ago

Segmentation is solved... https://github.com/RockeyCoss/Prompt-Segment-Anything

michaelt 1037 days ago

Segment Anything is neat, but segmentation is far from solved.

If the user generates a picture of a horse and rider to add onto another composition - they probably want to include the saddle.

GaggiX 1037 days ago

SAM is also conditioned on points, if it's ambiguous what you want to mask you can add a point on the saddle and the model will add it without a problem, segmentation is pretty much solved, I agree with the parent post.

IME I haven't gotten great results using SAM, maybe it was just the images I was using? They weren't great quality and it seemed to struggle with low contrast areas

Zetobal 1036 days ago

If it's audio, images, cg or video it's almost always GiGo.

mdp2021 1037 days ago

> Would be interesting to train a model on images with alpha channels

Would be even more interesting to get an ANN middle system of ontology of the (finally) represented content in order to change the single items.

An internal representation of qualified structured items in space as part of the chain. Prompt > accessible internal representation > render.

brianjking 1037 days ago

Is it possible to add SD XL support for this?

I'd love a colab notebook if anyone has the skill and time to do so.

If anyone wants to add SDXL support, all you have to do is create a new extension with the correct SDXL logic (loading from HF diffusers, etc.). You could parameterize `num_inference_steps`, for example, to delegate decisions to the user of the extension.

If anyone gets to making one before me, please leave a PR!

antman 1037 days ago

Can you add a layer with e.g. an image of yourself?

ttul 1037 days ago

Pretty sure you can do this. Diffusion models by default start with noise, but you can start with any data, including an existing image. For instance, you could import a photo of yourself, mask the eyes and then ask the model to make them green.

asynchronous 1037 days ago

Very cool honestly, seems like a much needed improvement over Automatic. Does it support LoRa/will it support in near future?

You can write an extension to support LoRA (~10 lines of Python HF Diffusers code).

If you get to this before me, please create a PR!

denvrede 1036 days ago

That looks pretty nice but I guess the HW required or the time you have to wait to iterate on these things (if you don't use external services) is quite high. Is there an estimation / idea when a "normal" person can play around with these things with a lot of operational or capital investment?

smrtinsert 1037 days ago

There's great articles on how layered uis are a lot easier to use than node based uis. Really excited to see a layered approach to SD. Its definitely time to break out of gradio.

TeMPOraL 1037 days ago

Maybe if they're talking about layered UIs with layer groups, which turn a flat stack into something resembling a tree. But even these UIs don't give you proper non-destructive editing - anything more complex requires you to duplicate parts of layer stack to feed as inputs, which is a destructive operation with respect to structure (those pasted layers won't update if you make changes to copied source). Doing this properly requires a DAG, at which point you're at node-based UIs (or some idiosyncratic mess of an UI that pretends it's not modelling a DAG).

It's all moot though, because as far as I know, there is no proper 2D graphics editing software that uses DAGs and nodes. Everyone just copies Photoshop. Especially Affinity, which is grating, given their recent focus on non-destructive editing. For some reason, node-based UIs ended up being a mainstay of VFX, 3D graphics, and VFX & gamedev support tooling. But general 2D graphics - photo editing, raster and vector creation? Nodes are surprisingly absent.

orbital-decay 1037 days ago

For some reason, node-based UIs ended up being a mainstay of VFX, 3D graphics, and VFX & gamedev support tooling. But general 2D graphics - photo editing, raster and vector creation? Nodes are surprisingly absent.

That's because non-destructive editing is mostly useful for animation, image series/sequences, and asset reuse, which are the most common in these fields. 2D artists have a different mental model, which is additionally set in stone by Photoshop and other software imitating it. Photographers use non-destructive editing, but mostly in simple cases because advanced things (retouching, creative compositing) can't and don't need to be done procedurally anyway.

danwills 1036 days ago

What about ancient 'Illusion', old 'Shake' or current 'Nuke' VFX compositing softwares that totally have been supporting node-based (ie DAG-based) comp-workflows since the early 2000s? Guess this is just a very different (much smaller) realm than your usual Photoshop's and so on?

dragonwriter 1037 days ago

> There's great articles on how layered uis are a lot easier to use than node based uis

I can see that being sensible for simple linear flows from one step to the next, with no branching merging, or connections that skip steps.

Seems to me that with any of those other things, a layered UI is going to start to break down a lot faster.

rytill 1036 days ago

Can you share such articles?

magic_hamster 1036 days ago

While automatic1111 is cumbersome and takes s while to learn, it seems far more capable. The layers here are just inpainting (as noted in the repository readme as well).

adventured 1037 days ago

Not a bad start. One quick suggestion: avoid the temptation to make it overly complex.

Stable Diffusion needs to go out to the masses to a greater degree. The unnecessary garbage complexity (eg Comfy's ridiculous noodlescape) that developers keep including into the UIs is holding Stable Diffusion back significantly from a greater mass adoption.

Node based workflows with little DRY capability (i.e. ComfyUI) do get painful as the workflow grows. That said, an http server capabable of executing ML DAGs is extremely useful and a great building block for other tools and UIs to be built upon.

I wrote a typescript API generator for ComfyUI recently and having programmatic access to let you build and send the execution graphs is a game changer. Hoping to have time to release it soon. Same can easily be done for any other language. Exciting stuff!

toenail 1037 days ago

First thoughts, how do I bind to an ip, and where can I install models?

ryukoposting 1037 days ago

If it can handle LoRAs, I'll be sure to try it out this weekend.

LoRAs can be handled as a straight-forward Python extension!

_sys49152 1036 days ago

its gonna be breathtaking when this technology gets close enough to make legit cartoons and animations. layers is a step closer to getting there.

[1] https://www.youtube.com/watch?v=tWZOEFvczzA

etra0 1036 days ago

Corridor Crew did a some sort of anime using this technique [1] and then they did two videos [2, 3] explaining the technology behind. Quite interesting if you ask me!

There still are some issues with the eyes and a bit of flickering but at the speed everything is moving I wouldn't be surprised if this improves in a year or two.

Needless to say, there's still a lot of artistry involved in such a process so anything is yet to be completely automated.

[2] https://www.youtube.com/watch?v=FQ6z90MuURM

[3] https://www.youtube.com/watch?v=mUFlOynaUyk

https://www.youtube.com/watch?v=CgKNTAjQpkk

synapticpaint 1036 days ago

This technology is already close to making animation. Check out some of my experiments with text to video here:

https://youtu.be/X0AhqMhEe-c

gatane 1037 days ago

Is this related to Melondream?

HeartStrings 1036 days ago

How is this better than A1111?

1. https://github.com/OpenDreamProject/OpenDream 2. https://opendream.ai/

Hamcha 1037 days ago

What's up with names nowadays? Not only there's already an OpenDream[1] on GitHub, but there's also a Stable Diffusion service also called OpenDream[2]!

smallerfish 1037 days ago

Slap a virtualenv setup into that install script please. A system wide pip install is a bad pattern.

done :)

noman-land 1037 days ago

Now that's agile.