| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by spullara 1807 days ago
	Does it need to reinitialize for each request or is there a warm start / cold start model like lambda? I don't really understand how you can charge per request.

3 comments

maldeh 1807 days ago

The pricing appears to be static per model with a ceiling on the monthly request count, not charged per request.

Edit: Actually, I didn't spot the free tier of 1000 requests. I wonder how you avoid the problem of a lot of users leaving defunct/disused models running while still keeping them hot - presumably some kind of limit to the model count?

link

theo31 1807 days ago

There is no cold start! We keep your service hot all the time.

link

spullara 1807 days ago

Well, I guess I know where I am going to host GPT-J-6B then. I don't think it is sustainable.

link

marcooliv 1807 days ago

How are you planning to put a gpt whatever when the service clearly have a model size limit?!

link

spullara 1806 days ago

The size limit is very close to allowing it (12GB vs 10GB). I imagine you can reduce it somewhat further and get it to fit.

link

rubatuga 1807 days ago

I guess GPU loading is quick? Like ~10 seconds?

link