Hacker News new | ask | show | jobs
by markurtz 1744 days ago
Disclosure: I work for Neural Magic.

Hi 37ef_ced3, AVX-512 has certainly helped close the gap for CPUs vs GPUs. Even then, though, most inference engines on CPUs are still very compute bound for networks. Unstructured sparsity enables us to cut the compute requirements down for CPUs leading to most layers becoming memory bound. This then allows us to focus on the unique cache architectures within the CPUs to remove the memory bottlenecks through proprietary techniques. The combination is what's truly unique for Neural Magic and enables the DeepSparse engine to compete with GPUs while leveraging the flexibility and deployability of highly available CPU servers.

Also as boulos said, the numbers quoted here are a bit lopsided. There are more affordable GPUs and GCP has enabled some very cost effective T4 instances. Our performance blog on YOLOv3 walks through a direct comparison to T4s and V100s on GCP for performance and cost: https://neuralmagic.com/blog/benchmark-yolov3-on-cpus-with-d...

Additionally, I'd like to clarify that our open sourced products do not communicate with any of our servers currently. These are stand alone products that do not require Neural Magic to be in the loop for use.