|
|
|
|
|
by Traubenfuchs
374 days ago
|
|
I bet there is a set of repetitive single, or two, question user requests that makes out a sizeable amount of all requests. The models are so expensive to run, 1% would be enough. Much less than 1%. To make it less obvious they probably have a big set of response variants. I don't see how they would not do this. They probably also have cheap code or cheap models that normalize requests to increase cache hit rate. |
|