|
|
|
|
|
by kristoo
1910 days ago
|
|
Thanks! Great question.
The seconds vary a lot on a task, model and input basis. For example, if you do something quick like text vectorisation on a couple of sentences then it is less than 100ms per call.
That would be 10,000, 50,000 and 200,000 calls (0.0005, 0.0002, 0.0001 per additional call) respectively. On the other end, if using GPT-2 Large to generate around 40 words then that takes around 2500ms per call. Giving 400, 2000 and 8000 calls (0.0125, 0.005, 0.0025 per additional call) respectively. |
|