Hacker News new | ask | show | jobs
by aronowb14 1393 days ago
It's funny how hyped up stable diffusion is on HN right now: reminds me of when style transfer first started making it's rounds in 2017. https://news.ycombinator.com/item?id=13958366

I think as technologists we want to think that code can "solve" some of the problems in the art world... but I think we still have a really, really long way to go. I tried to get style transfer adopted at work (worked at a creative technology firm in NY) but frankly I think deep learning methods for art generation tend to be really unpredictable, which make them pretty hard to use for professional applications. Imagine deploying production code that only worked 85% of the time... would be a nightmare. I felt, and feel similarly about deep learning approaches to art. They're just so finnicky and unpredictable, for example, add a single extra pixel to that example in this article and the output would look completely different.

Either way, cynicism aside, stable diffusion is awesome :).

2 comments

> Imagine deploying production code that only worked 85% of the time... would be a nightmare. I felt, and feel similarly about deep learning approaches to art. They're just so finnicky and unpredictable, for example, add a single extra pixel to that example in this article and the output would look completely different.

Don't think the metaphor works. Code that only works 85% of the time is obviously broken but Art is subjective so an 85% solution to a creative problem could be more than enough for most consumers.

It takes 3 seconds to generate 1 image with my GPU.

I can find a good prompt within 30 minutes to 1 hour.

My GPU can generate 100 images in 5 minutes.

Out of those 100 images, 10 is very close to what I exactly meant at professional concept artist level.

So, in this case Stable Diffusion only working 10% of the time is fine.

Future is already here, I’m already incorporating stable diffusion generated images to my professional work.

What kind of GPU are you running this on? My 3080 seems to take about 30 seconds per image with 50 passes. I'm wondering if I'm missing out on some optimizations. Could just be the quality of Linux NVidia drivers.
I'd recommend trying a different fork. Perhaps you're using the the official one. I believe that one still "ramps up the system" on every image generation. Other repos do the ramp up only once.
Yeah, this might be the problem. I was on the main fork, but going to try switching over to this: https://github.com/hlky/stable-diffusion
That’s weird, I got RTX3070 on Windows.

Are you using 512x512 images or larger ones?

Best workflow is to keep images close to 512x512, record the seed and then upscale.

I'm using 512x768 as the default, but a quick test shows only a marginal difference in speed between the two. I'll have to give Windows a try to see if it's the driver holding me back. Do you have any tips or resources for up-scaling the image after?
Currently this library can generate multiple images and upscale them through RealESRGAN: https://github.com/hlky/stable-diffusion

If you are not using this library already, give it a shot.

Also, I'm using Nvidia Studio drivers though I'm not sure if that would make a difference.

I've been using the main fork. This even has GFPGAN built in! Looks very useful thanks.