Hacker News new | ask | show | jobs
by AndrewKemendo 3556 days ago
Actually the tensorflow implementation he uses does both segmentation and classification and returns a probabilistic graph of objects. For his application, it's only returning the top result, so it looks more basic than it is.
1 comments

No, it doesn't and there is no graph returned whatsoever. It's just a list of the top classification labels for the image (see example at the tutorial he cited https://github.com/tensorflow/tensorflow/tree/master/tensorf...). This is not the result of a segmentation but is rather a list of the top labels the model believes this could be. If you look at the top results you'll see they're usually similar/in the same family (again, refer to the example in the linked tutorial, the top 3 labels are: military uniform, suit, academic gown). This is literally the normalized output of the nodes of the last layer in the neural network (where each node corresponds to one category). If you added all probabilities together it'd sum to 1.
That's my point. With these OTS modules they are only returning on known classifiers.

The system has to segment before it classifies. That isn't returned to the user, but gradient descent is happening in the background. Like I said, it's a nitpick but important if you're trying to really build novel CV applications.

One of my gripes with people implementing pre-built modules from TF is that you don't really build any of the hard stuff, and it's pre-trained so not much learning is happening. You can't for example build RL systems with off the shelf TF implementations.

Do you understand how convolutional neural networks work? There is no segmentation involved here at all. The input are the raw pixels of the image. The output is the probability this image belongs to one of the categories the network is capable of predicting.

Also gradient descent has nothing to do with segmentation at all, I don't understand what you're talking about. Gradient descent is used to find the set of weights that minimizes the error. This is standard in training neural networks of any kind using backpropagation.