| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ttul 865 days ago
	I’ve been playing with diffusion a ton for the past few months, writing a new sampler that implements an iterative blending technique described in a recent paper. The latent space is rich in semantic information, so it can be a great place to apply various transformations rather than operating on the image directly. Yet it still has a significant spatial component, so things you do in one spatial area will affect that same area of the image. Stable Diffusion 1.5 may be quite old now, but it is an incredibly rich model that still yields shockingly good results. SDXL is newer and more high tech, but it’s not a revolutionary improvement. It can be less malleable than the older model and harder to work with to achieve a given desired result.

3 comments

fpgaminer 865 days ago

> It can be less malleable than the older model and harder to work with to achieve a given desired result.

That has been my experience as well. It's frustrating because SDXL can be exquisite, but SD 1.5 is more "fun" to work with and more creative. I can throw random ideas into a mish-mash of a prompt and SD 1.5 will output an array of interesting things while SDXL will just seem to fall back to something "reasonable", ignoring anything "weird" in the prompt. SDXL also seems to have a lot more position bias in the prompt. SD 1.5 had a bit of that, paying more attention to words earlier in the prompt, but SDXL takes that to a new level.

But SDXL can draw hands consistently, so ... it's a tough choice.

link

designium 865 days ago

With comfyui, you can do SDXL > SD1.5 or SD1.5 > SDXL, it makes more sense to generate basic image in SDXL Turbo and apply the effects of a checkpoint later.

link

SV_BubbleTime 865 days ago

Kind of blowing my mind here.

Coming from Auto1111 for a year, I thought comfy was most like always using img2img, then I figured out it wasn’t that but laten2latent… which is cool, but using XL to get the better prompting and 1.5 to get checkpoints and Loras I want is making it all click now.

link

ttul 863 days ago

ComfyUI is insanely amazing. The learning curve is well worth the effort.

link

countWSS 864 days ago

SDXL(and possibly 2.1) switched to different CLIP implementation that is geared for sentence-level understanding, SD1.5 uses old CLIP that works with tag-cloud type prompts.

link

ttul 863 days ago

SDXL actually takes conditioning from either the old or the new CLIP, or both. The malleability of SDXL is not just down to the choice of the new CLIP; the UNet itself is more opinionated.

link

lrasinen 864 days ago

> But SDXL can draw hands consistently, so ... it's a tough choice.

Looking at the article photos it still has some way to go. I counted 3 cases of missing fingers, two cases of extra fingers (on the cartoon girl), and a few arm poses that in real life would need medical attention.

link

wruza 864 days ago

Entered this thread to write your comment. I find SDXL inferior to 1.5 and yes, much harder to work with.

My another issue is that sdxl images that you can see on the web always have that “from a movie/ads”-?ish? coating. Can’t explain it, but it feels even more uncanny than 1.5.

SDXL is too resource-hungry for what it produces. 3x+ model sizes, 12GB vram is barely enough for it, 40 steps is the minimum, and I don’t think training loras will turn out feasible at all. I can’t lower the resolution without distortions, and even proportions are hard to deal with. It feels much less flexible than 1.5 in this regard.

I’m sticking with 1.5, no sdxl plans.

link

ttul 863 days ago

Particularly considering the rich world of SD1.5 fine-tunes, SDXL leaves so much to be desired. I'm sure it will all be sorted out eventually, but right now, the momentum in the community just isn't there with SDXL the way it is with 1.5.

link

3abiton 864 days ago

I am curious what do you mean by high tech?

link

ttul 863 days ago

SDXL is a more sophisticated model architecture. It has more layers. The CLIP model is bigger.

link