Hacker News new | ask | show | jobs
by rasmi 3134 days ago
You can deploy TensorFlow model binaries as serverless APIs on Google Cloud ML Engine [1]. But I would also be interested in seeing a TensorFlow Lite implementation.

[1] https://cloud.google.com/ml-engine/docs/deploying-models

Disclaimer: I work for Google Cloud.

1 comments

Thanks, @rasmi. I have a feedback for you guys. The pricing for predictions inference in GCP is not very fair. If I deploy a small model (like a SqueezeNet or Mobilenet) I pay almost the same price of someone deploying large models (like Resnet or VGG). That’s why I’m deploying my models on serverless environments and paying about 5 dollars for 1 million inferences.

The pricing of GCP is: $0.10 per thousand predictions, plus $0.40 per hour. That’s more than 100 dollars for 1 million inferences.

I see what you mean. To some companies, ML Engine's cost as a managed service may be worth it. To others, spinning up a VM with TensorFlow Serving on it is worth the cost savings. If you've taken other approaches to serving TensorFlow models to get around ML Engine's per-prediction cost, I'm curious to hear about them.