| I was in the same boat and this is good advice! I stopped using gpu's, "Vectorized inference isn’t bad at all!". This soo much, I was blinded with gpu speed, using tensorflow builds with avx optimization is actually pretty fast. My discovery: + Stop expensive GPU's for inference and switch to avx optimized tensorflow builds. + Cleaned up the inference pipeline and reduced complexity. + Buying compute instance for a year or more provides a discount. - I never got pruning to work without a significant loss increase. - Tried spot instances with gpu's that are cheaper. Random kills and spinning up new instances took too long loading my code. The discount is a lot, but I couldn't reliable get it up. Users where getting more timeouts. I bailed and just used cpu inference. The gpu was being underutilized, using cpu only increased the inference to around 2-3 seconds. With the price trade off it was a more simpel,cheaper and easier solution. |