Hacker News new | ask | show | jobs
by avin_regmi 2655 days ago
1. Caching the input will save lots of time. Inputs are not unique each time. In a production environment, lots of inputs are the same. Many platforms in fact will do caching such as Algorithmia, TF Serving, and Sagemaker. If a time to do a search in Redis database is faster than forward pass, caching will reduce time dramatically. Watch my youtube video where I give an example.

2. It's up to you if you want to use it in GPU or CPU. Benchmark was done in a CPU but you're free to download panini via Helm and use GPU in your private kubernetes.

3. For now, during beta testing, we're offering free inference and there is a limit of model size cannot exceed over 2GB.

Hope this was helpful.

1 comments

I don’t know what sort of production you’ve been exposed to, but the inputs to a Deep Net are almost never the same.

We have hundreds of models, across many domains, real estate, energy prediction, time series crypto, video analytics, molecular modelling.

I would bet money that across the millions of predictions that we make weekly, over all of the models, no two inputs are the same.

That’s kind of the point of Deep Learning - high dimensional noisy input

Caching will not help you here

It really depends on the application. Such as content recommendation, prediction of popular items are requested frequently. We maintain prediction cache so we can serve the frequent cache without passing into the model. We also use cache for selecting a model. To do this we join the original prediction with the feedback it receives. Feedbacks are received soon after the prediction, even unique query can benefit from a cache. Most of the prediction models are not Deep learning these days. Most companies are using classical machine learning. In our case, we trained SVM in SciKit learn feedback throughput of 1.8x. We have a simple LRU eviction for cache and use normal cache eviction algorithm.
The point we are trying to make is that we don't think caching adds much value to the product. It is very easy to implement and doesn't help much.
I'll take your feedback into consideration :)