Performance charts in the article are for training of CNNs on CPU. Are there non-educational use cases for that? How does CPU CNN inference speed compare?
We do have to optimize inference code paths for fast training. We are currently adding optimizations that target inference-only use cases (some PRs are already up). Will post data as soon as that is done. Stay tuned!