Hacker News new | ask | show | jobs
by kristoo 1910 days ago
Thanks! Great question. The seconds vary a lot on a task, model and input basis.

For example, if you do something quick like text vectorisation on a couple of sentences then it is less than 100ms per call. That would be 10,000, 50,000 and 200,000 calls (0.0005, 0.0002, 0.0001 per additional call) respectively.

On the other end, if using GPT-2 Large to generate around 40 words then that takes around 2500ms per call. Giving 400, 2000 and 8000 calls (0.0125, 0.005, 0.0025 per additional call) respectively.

1 comments

Completely understandable, and I appreciate the realistic estimates you've provided. The usage per tier definitely seems fair. I'll keep an eye on this project and hopefully circle back soon. Thanks for the reply!