| HN Mirror

It really depends on the application. Such as content recommendation, prediction of popular items are requested frequently. We maintain prediction cache so we can serve the frequent cache without passing into the model. We also use cache for selecting a model. To do this we join the original prediction with the feedback it receives. Feedbacks are received soon after the prediction, even unique query can benefit from a cache. Most of the prediction models are not Deep learning these days. Most companies are using classical machine learning. In our case, we trained SVM in SciKit learn feedback throughput of 1.8x. We have a simple LRU eviction for cache and use normal cache eviction algorithm.