|
|
|
|
|
by chillee
932 days ago
|
|
Surprisingly, no. And part of this is that text generation is really expensive. Unlike traditional ML inference (like with, resnets), you don't just pass your data through your model once. You need to pass it over and over again (once for each token you generate). So, in practice, a full "text completion request" can often take on the order of seconds, which dwarfs the client <-> server roundtrip. |
|