Hacker News new | ask | show | jobs
by Traubenfuchs 374 days ago
I bet there is a set of repetitive single, or two, question user requests that makes out a sizeable amount of all requests. The models are so expensive to run, 1% would be enough. Much less than 1%. To make it less obvious they probably have a big set of response variants. I don't see how they would not do this.

They probably also have cheap code or cheap models that normalize requests to increase cache hit rate.