The problem is (as always) the "bad user" case. You get some users who run at 100% utilization full time (or more because depending on your model they might be able to have multiple instances). They'll be the ones doing things like running a Discord bot in a popular server, or reselling the image generation or something.
Many users will use the service at once though, not evenly distributed ... so you might wanna overprovision. Which is basically what you dont wanna do - profitability is reached by underprovisioning.
I think for an AI generation service this problem is actually more solvable than usual. You can slow down how fast the results are returned, which will slow down the demand. Charge more for a higher tier that gets prioritized. People are going to be a somewhat bothered if the result takes 10 seconds instead of 1 second, but it’s not the end of the world if it’s a rare event. If Netflix can’t keep up with demand and your video spends half the time buffering that would be a worse failure mode.
Some random stats for successful web services (unit is average minutes of use per day per user):
YouTube - 19 minutes
Spotify - 140 minutes
TikTok - 95 minutes
Netflix - 71 minutes
So we’re looking at roughly a 1% - 10% utilization range, depending on where your game streaming or AI inference app falls. You need to factor that in when figuring out the pricing, your competition certainly will.
My intuition tells me GPU utilization is very different. Those services are egress bound. Egress is super elastic and can be scaled to stupefyingly large numbers.
GPU utilization is less scalable. No GPU cloud service is particularly popular. I don’t think any of them are profitable. Having 1:1 GPUs to users use tough.
Gaming is especially difficult because it’s super latency sensitive. Which means you need racks or expensive GPUs sitting in hundreds of edge nodes. I’m super bearish on cloud gaming still.
ML tools aren’t that sensitive. They’ll exist and they’ll be profitable. But I think the economics are tough. And as a consumer I’d still greatly prefer to run locally. Which means there’s a tension between “good for the customer” and “good for the business”.
This kills your margin.