Hacker News new | ask | show | jobs
by bwasti 2648 days ago
You may want to check out Intel's optimized version of TensorFlow Serving[1] for further improvements (on the order of 2x for ResNet-50[2]).

As an aside, I took into account the resource allocation in the parent comment. The c5.2xlarge has 8 cores, 8GB RAM [3] and does a single fp32 inference in ~17ms. If we chop that down to 4 cores and assume linear scaling we can fathom running ResNet-50 in ~35ms compared to the ~500ms achieved here. I'd recommend comparing to a known baseline rather than a "vanilla setup" to ensure you aren't missing any simple changes that may dramatically improve performance.

[1] https://github.com/IntelAI/models/blob/master/docs/general/t...

[2] https://www.intel.ai/improving-tensorflow-inference-performa...

[3] https://aws.amazon.com/ec2/instance-types/c5/

1 comments

@bwasti, really good points - this is something we look forward to evaluating! Our post does indeed outline optimizations from tensorflow/serving to tensorflow/serving:* -devel [1]. The next logical improvement (given intel architecture and docs linked) is start building on top of the * -devel-mkl image.

-masroor(author)

[1] https://github.com/tensorflow/serving/tree/master/tensorflow...