Hacker News new | ask | show | jobs
by kahnjw 2502 days ago
Latency in training literally does not matter. You care about throughput. In serving, where latency matters, most DL frameworks allow you to serve the model from a highly optimized C++ binary, no python needed.

The poster you are replying to is 100% correct.