Hacker News new | ask | show | jobs
by omgwtfbyobbq 915 days ago
Do your have references for that?

I found this claiming an A100 can generate 1 image/s.

https://oneflow2020.medium.com/text-to-image-in-less-than-1-...

2 comments

> I found this claiming an A100 can generate 1 image/s.

The article you linked is over a year old. Needless to say there have been a LOT of optimizations in the last year.

Back then it was common to use 50+ steps for many of the common samplers. Current methods use a few steps like 1. This OnnxStream are using SDXL-turbo, and you can combine LCM and a few other methods to go very fast.

The reason it's so much faster now is the OnnxStream is only using a single step.

This repo claims 149 images/s on a 4090 https://github.com/aifartist/ArtSpew

However even if you only get 1 image/s with whatever GPU you have I stand by my original statement that unless you want to do it for the cool factor (which is very valid), pre-calculating them makes more sense.

> This repo claims 149 images/s on a 4090

I actually get around 100 imgs/s on my 3080Ti. Three things to note: 1) you gotta run the max perf code to get the high throughput, 2) the images in this setting are absolute garbage, 3) you don't save the images so you're going to have to edit the code to extract them.

Definitely agree that this project is much more about the cool factor. I suggested a GAN in other comment for similar reasoning (because it's a pi...) but if you want quality images well I'm not sure why anyone would expect to get those out of a pi. High quality images take time and skill. But it's also HN, I'm all for doing things for the cool factor (as long as we don't sell them as things they aren't. ML is cool enough that it doesn't need all the hype and garbage)

> Back then it was common to use 50+ steps for many of the common samplers. Current methods use a few steps like 1.

The "look how fast we can go" method (turbo model with 1 step and without CFG) is blindingly fast, but the quality is...nothing close to what was being done in normal 50+ steps with normal setitngs gens.

Realistically, even with Turbo+LCM, you're still going to 4+ steps (often 8+), with CFG, for reasonable one-generation quality anywhere close to the images people generated at 50+ steps without Turbo/LCM.

Which is still a big improvement in speed.

> Realistically, even with Turbo+LCM, you're still going to 4+ steps (often 8+), with CFG, for reasonable one-generation quality anywhere close to the images people generated at 50+ steps without Turbo/LCM.

For sure the only reason I considered comparing it that way was because the linked repo appears to also be going for a similar approach with 1 step/image on the pi.

From my own experience I've had a hard ever getting a decent image below 6~8steps, but this repo seems more focused on getting it to run in a reasonable amount of time at all, which understandably requires the minimal "maybe passable" settings.

They're might be talking about this[0] as it has been popular recently. It can definitely do >60 imgs/s on my 3080Ti, but you're not going to want any of those images. They are absolute garbage.[1] I can do a little under an image a second and some may be quite usable, but nowhere near what you're going to get from the standard model.

[0] https://github.com/aifartist/ArtSpew/

[1] but the project is still cool, just context...