| HN Mirror

You may want to check out Intel's optimized version of TensorFlow Serving[1] for further improvements (on the order of 2x for ResNet-50[2]).

As an aside, I took into account the resource allocation in the parent comment. The c5.2xlarge has 8 cores, 8GB RAM [3] and does a single fp32 inference in ~17ms. If we chop that down to 4 cores and assume linear scaling we can fathom running ResNet-50 in ~35ms compared to the ~500ms achieved here. I'd recommend comparing to a known baseline rather than a "vanilla setup" to ensure you aren't missing any simple changes that may dramatically improve performance.

[1] https://github.com/IntelAI/models/blob/master/docs/general/t...

[2] https://www.intel.ai/improving-tensorflow-inference-performa...

[3] https://aws.amazon.com/ec2/instance-types/c5/