Hacker News new | ask | show | jobs
by spullara 1807 days ago
Does it need to reinitialize for each request or is there a warm start / cold start model like lambda? I don't really understand how you can charge per request.
3 comments

The pricing appears to be static per model with a ceiling on the monthly request count, not charged per request.

Edit: Actually, I didn't spot the free tier of 1000 requests. I wonder how you avoid the problem of a lot of users leaving defunct/disused models running while still keeping them hot - presumably some kind of limit to the model count?

There is no cold start! We keep your service hot all the time.
Well, I guess I know where I am going to host GPT-J-6B then. I don't think it is sustainable.
How are you planning to put a gpt whatever when the service clearly have a model size limit?!
The size limit is very close to allowing it (12GB vs 10GB). I imagine you can reduce it somewhat further and get it to fit.
I guess GPU loading is quick? Like ~10 seconds?