|
|
|
|
|
by avin_regmi
2655 days ago
|
|
1. Caching the input will save lots of time. Inputs are not unique each time. In a production environment, lots of inputs are the same. Many platforms in fact will do caching such as Algorithmia, TF Serving, and Sagemaker. If a time to do a search in Redis database is faster than forward pass, caching will reduce time dramatically. Watch my youtube video where I give an example. 2. It's up to you if you want to use it in GPU or CPU. Benchmark was done in a CPU but you're free to download panini via Helm and use GPU in your private kubernetes. 3. For now, during beta testing, we're offering free inference and there is a limit of model size cannot exceed over 2GB. Hope this was helpful. |
|
We have hundreds of models, across many domains, real estate, energy prediction, time series crypto, video analytics, molecular modelling.
I would bet money that across the millions of predictions that we make weekly, over all of the models, no two inputs are the same.
That’s kind of the point of Deep Learning - high dimensional noisy input
Caching will not help you here