Hacker News new | ask | show | jobs
by avin_regmi 2659 days ago
Hey, both prediction for TF serving and panini serving was done in a single thread in the same specification machine. We used a simple model for image classification of CIFAR dataset. Roughly, 500 predictions were made for panini and 200 predictions for TF serving.

You can always download the entire panini in your own private server and not pay anything. Ie. used Helm to install in your own kubernetes or DockerHub. For now, We're making it free for models under 2GB. Our main goal is to make it usable and we don't want cost to be a factor.

1 comments

So you claim that TF Serving (written in C++ I believe) has over double the overhead compared to Panini?

This seems surprising. What makes it so much faster?

Edit: Unless of course you are hitting the cache for a lot of the predictions?

Optimized TF serving would perform similarly to Panini however, it's really hard to find good documentation on optimizing TF serving compilation parameters. Panini automatically finds the right batch size to maximize the throughput and it adaptively changes. We also have a technique to reduce bound tail latency. I would love it for you to try it and provide me some feedback. Thanks