Hacker News new | ask | show | jobs
by filterfiber 915 days ago
This project is a fun POC but it's not very practical for that type of application.

A 4090 can generate over 100 images a second with turbo+lcm and a few techniques, you can make 2 days worth of images in 1 seconds. You could make a years worth in roughly 3 minutes and put them on the sd card

3 comments

Do your have references for that?

I found this claiming an A100 can generate 1 image/s.

https://oneflow2020.medium.com/text-to-image-in-less-than-1-...

> I found this claiming an A100 can generate 1 image/s.

The article you linked is over a year old. Needless to say there have been a LOT of optimizations in the last year.

Back then it was common to use 50+ steps for many of the common samplers. Current methods use a few steps like 1. This OnnxStream are using SDXL-turbo, and you can combine LCM and a few other methods to go very fast.

The reason it's so much faster now is the OnnxStream is only using a single step.

This repo claims 149 images/s on a 4090 https://github.com/aifartist/ArtSpew

However even if you only get 1 image/s with whatever GPU you have I stand by my original statement that unless you want to do it for the cool factor (which is very valid), pre-calculating them makes more sense.

> This repo claims 149 images/s on a 4090

I actually get around 100 imgs/s on my 3080Ti. Three things to note: 1) you gotta run the max perf code to get the high throughput, 2) the images in this setting are absolute garbage, 3) you don't save the images so you're going to have to edit the code to extract them.

Definitely agree that this project is much more about the cool factor. I suggested a GAN in other comment for similar reasoning (because it's a pi...) but if you want quality images well I'm not sure why anyone would expect to get those out of a pi. High quality images take time and skill. But it's also HN, I'm all for doing things for the cool factor (as long as we don't sell them as things they aren't. ML is cool enough that it doesn't need all the hype and garbage)

> Back then it was common to use 50+ steps for many of the common samplers. Current methods use a few steps like 1.

The "look how fast we can go" method (turbo model with 1 step and without CFG) is blindingly fast, but the quality is...nothing close to what was being done in normal 50+ steps with normal setitngs gens.

Realistically, even with Turbo+LCM, you're still going to 4+ steps (often 8+), with CFG, for reasonable one-generation quality anywhere close to the images people generated at 50+ steps without Turbo/LCM.

Which is still a big improvement in speed.

> Realistically, even with Turbo+LCM, you're still going to 4+ steps (often 8+), with CFG, for reasonable one-generation quality anywhere close to the images people generated at 50+ steps without Turbo/LCM.

For sure the only reason I considered comparing it that way was because the linked repo appears to also be going for a similar approach with 1 step/image on the pi.

From my own experience I've had a hard ever getting a decent image below 6~8steps, but this repo seems more focused on getting it to run in a reasonable amount of time at all, which understandably requires the minimal "maybe passable" settings.

They're might be talking about this[0] as it has been popular recently. It can definitely do >60 imgs/s on my 3080Ti, but you're not going to want any of those images. They are absolute garbage.[1] I can do a little under an image a second and some may be quite usable, but nowhere near what you're going to get from the standard model.

[0] https://github.com/aifartist/ArtSpew/

[1] but the project is still cool, just context...

But that's not the point, obviously. Sometimes, being slow is a feature. Besides, a 4090 costs more than a small car.
> But that's not the point, obviously.

If you want to say the zero2-w is what's making it then sure.

> Besides, a 4090 costs more than a car.

They only cost ~0.70USD for 1 hr. In fact you could put this on an A100 for 1$/hr. Renting would make the most sense for this type of thing.

It depends on what you're using the images for.

If there's a human in the loop, 100 images/s is likey too much volume, especially if prompt engineering is needed.

At the same time, 2 images/hr is way too slow.

The whole point was that you’d be getting ramdom puctures just-in-time, at a leisurely rate suitable for background image rotation, without other interaction.
I mean, if you need a human in the loop to verify the image quality then you HAVE to pre-compute the images.

> 100 images/s is likey too much volume

You can always generate less

I think you’re just missing the point, which most certainly isn’t buying compute to generate zillions of images ahead-of-time and then replaying them at a rate of one every half hour or whatever. Anyone can do this. The idea of having a tiny instance of SD crammed on a tiny computer taking its time to compute the images just-in-time (so you don’t even in theory don’t know what you’re going to get next) is simply much more fun and original, never mind way more aligned with the hacker ethos.
$1600 is more than a car?

I feel like you can't even find driveable cars that will last 100 miles at that price point anymore.

You probably can, but it'll take some time. The supply of reasonable reliable $500-$1000 beaters is a lot less than it used to be.
hyperbole

/hʌɪˈpəːbəli/

noun

exaggerated statements or claims not meant to be taken literally.

It's so nice of you to offer to buy 4090 cards for people who can only otherwise afford Raspberry Pis ;)
I was just using that as a reference. Stable diffusion will run well with almost any relatively modern gpu.

You don't have to use a 4090, you'll still get double digit performance with a 3060 or whatnot.

> for people who can only otherwise afford Raspberry Pis ;)

You can rent a 4090 for 0.7USD/1hr, or get an A100 for 1.1USD/hr. And if your project is a display + raspberry pi then those costs will dwarf the rental cost.

You can use Google Colab for free as well.
I tried to load up atomic yesterday to see how it has changed since early in the year.

It failed to load. They reported that it was detected as against their terms of service.