|
|
|
|
|
by human_afterall
2648 days ago
|
|
Hi bwasti, the host's CPU platform is Intel Broadwell. While the CPU architecture of our production hosts are the same, the resources allocated are much higher than 4 cores. This post details an overview of the relative improvements that can be made from a vanilla setup :) -masroor (author) |
|
As an aside, I took into account the resource allocation in the parent comment. The c5.2xlarge has 8 cores, 8GB RAM [3] and does a single fp32 inference in ~17ms. If we chop that down to 4 cores and assume linear scaling we can fathom running ResNet-50 in ~35ms compared to the ~500ms achieved here. I'd recommend comparing to a known baseline rather than a "vanilla setup" to ensure you aren't missing any simple changes that may dramatically improve performance.
[1] https://github.com/IntelAI/models/blob/master/docs/general/t...
[2] https://www.intel.ai/improving-tensorflow-inference-performa...
[3] https://aws.amazon.com/ec2/instance-types/c5/