|
|
|
|
|
by dnnssl2
927 days ago
|
|
If you were to serve this on a datacenter server, is the client to server roundtrip networking the slowest part of the inference? Curious if it would be faster to run this cloud GPUs on better hardware but farther compute, or locally with worse hardware. |
|
So, in practice, a full "text completion request" can often take on the order of seconds, which dwarfs the client <-> server roundtrip.