Hacker News new | ask | show | jobs
by TuringNYC 1279 days ago
>> Monetizing models is tricky because it’s so cheap to run locally but so expensive in the cloud.

Can you expand on this a bit? The way i'm thinking, that is only the case if you need low-latency. And in that case, it seems you just need to charge to cover compute.

We're running Stable Diffusion on an eks cluster and it evens out the load across calls and prevents over-resourcing.

If latency isnt an issue, it can be run on non-gpu machines. If you're looking for someone under $300 or $400/mo, then I agree it may be an issue.

On that note, I havent checked whether there are lambda/fargate style options which provide GPU power, to achieve consumption based pricing tied to usage, but that might be a route. Can anyone speak to this?

4 comments

>On that note, I havent checked whether there are lambda/fargate style options which provide GPU power, to achieve consumption based pricing tied to usage, but that might be a route. Can anyone speak to this?

https://lambdalabs.com/service/gpu-cloud

Thanks for this. This is nice and the prices are great...but I was specifically curious about something where consumption can be tied to cost (e.g. lambda/fargate style where you pay by the call)
It's not quite lambda, but GKE auto pilot supports GPU workloads, so it could be a relatively easy way to do this.

You could have a rest service sticking incoming requests into a queue, and then a processor deployment picking off the queue using the GPU resource requests / spot instances. You'd probably also want something to be scaling the processor deployment replicas based on the queue depth and your budget.

I haven't compared the pricing to EKS so unsure if it would really be better financially, but it would avoid having to manage scaling up/down GPU nodes explicitly.

https://cloud.google.com/kubernetes-engine/docs/how-to/autop...

https://www.banana.dev/ have been working on the Lambda-style thing. I haven't tried it but looks very impressive.
> If you're looking for someone under $300 or $400/mo, then I agree it may be an issue.

Yeah. These models don’t need special resources to run. As a consumer I would prefer to buy a 4090 and then run everything locally. I don’t want to pay $10 or $20 monthly subs to a half dozen different AI services. All professional software turning into subscription services sucks.

Midjourney charges $30/mo for unlimited “relax” time and 15 hours of fast GPU time. That’s not too bad. But multiply that by 6 services and a 4090 pays for itself in a few months.

Midjourney is completely different from SD on a computational level. SD is optimized for speed, it takes 5 seconds to generate a 512x512, and their internal optimizations are bringing it down to 0.5 seconds (stated on their twitter). To achieve this, they do one-shot generation straight to 512x512, without upscaling slowly from 64x64 -> 256x256 -> 512x512

Midjourney is optimized for quality. It actually does do the gradual upscaling, which is how the Imagen and Ediffi papers demonstrated. This results in far better quality, but extremely taxing and slow. Even on 'fast' mode it runs like a snail compared to SD. I don't think it'll work on anything below a A100.