Hacker News new | ask | show | jobs
by nl 2659 days ago
So you claim that TF Serving (written in C++ I believe) has over double the overhead compared to Panini?

This seems surprising. What makes it so much faster?

Edit: Unless of course you are hitting the cache for a lot of the predictions?

1 comments

Optimized TF serving would perform similarly to Panini however, it's really hard to find good documentation on optimizing TF serving compilation parameters. Panini automatically finds the right batch size to maximize the throughput and it adaptively changes. We also have a technique to reduce bound tail latency. I would love it for you to try it and provide me some feedback. Thanks