Hacker News new | ask | show | jobs
by human_afterall 2648 days ago
Hi bwasti, the host's CPU platform is Intel Broadwell. While the CPU architecture of our production hosts are the same, the resources allocated are much higher than 4 cores. This post details an overview of the relative improvements that can be made from a vanilla setup :)

-masroor (author)

1 comments

You may want to check out Intel's optimized version of TensorFlow Serving[1] for further improvements (on the order of 2x for ResNet-50[2]).

As an aside, I took into account the resource allocation in the parent comment. The c5.2xlarge has 8 cores, 8GB RAM [3] and does a single fp32 inference in ~17ms. If we chop that down to 4 cores and assume linear scaling we can fathom running ResNet-50 in ~35ms compared to the ~500ms achieved here. I'd recommend comparing to a known baseline rather than a "vanilla setup" to ensure you aren't missing any simple changes that may dramatically improve performance.

[1] https://github.com/IntelAI/models/blob/master/docs/general/t...

[2] https://www.intel.ai/improving-tensorflow-inference-performa...

[3] https://aws.amazon.com/ec2/instance-types/c5/

@bwasti, really good points - this is something we look forward to evaluating! Our post does indeed outline optimizations from tensorflow/serving to tensorflow/serving:* -devel [1]. The next logical improvement (given intel architecture and docs linked) is start building on top of the * -devel-mkl image.

-masroor(author)

[1] https://github.com/tensorflow/serving/tree/master/tensorflow...