Hacker News new | ask | show | jobs
by angerbot 3550 days ago
Very cool. I've been toying with the idea of using something like this or perhaps the cloud vision API to automatically generate image captions for screen readers (e.g. through a browser extension) but the cost to run something like an EC2 GPU unit is prohibitive for a project like that which I wouldn't want to charge for.

Running it locally on the user's machine would take far too long to train, especially as you would have to use the CPU in the majority of cases since many people don't have a separate GPU.

3 comments

While you would never do this kind of training on your user's machines (which takes multiple weeks even with a powerful GPU), you should be able to apply the trained model to a single photo nearly instantaneously. So the real roadblock is mostly that they don't appear to have a included a completely pre-trained model with this release, and it will take you as a developer a lot of GPU time to train one. But your users would not necessarily have a problem captioning images on their machines.
I hadn't considered that (this is really out of my depth). Any ideas on what the actual size of a trained model would be to distribute? Taking 150G on the user's hard drive is out as well, probably.
Depends on the model and dataset, inceptionv3 trained on imagenet is about 150mb but you can quantise the weights to 8bit and prune it much smaller without affecting perf much
Here's a complete model for image recognition that works fine on a notebook: https://www.tensorflow.org/versions/r0.10/tutorials/image_re...
You can run this model on a RasberryPi.

Training is another matter.