Hacker News new | ask | show | jobs
by visarga 3952 days ago
> Today’s artificial intelligence, you see, requires at least some human training. If you want a system to automatically identify cats in YouTube videos, humans must first show it what a cat looks like.

The article is written by someone who doesn't know what he's talking about. The "cat videos" story from a while back ostensibly used Unsupervised training, that means, the Google team didn't have to tell the deep neural net what a cat looks like, it discovered the concept of "catness" by itself (there was a "cat" neuron in the top layer).

I'm wondering who writes all the AI articles I read every day. Such a detail was crucial for the cat story. It's easy to make a cat/non-cat classifier with a few thousand labeled images for each category. The hard thing to do is to take raw photos with no labels and still discover cats.

2 comments

Unsupervised training may isolate the defining features of a cat picture, but it won't know that that's what we call "cat", so no unsupervised system will be able to identify cats in videos unless you show it at least one labeled image ("show it what a cat looks like").

In fact that very network produced also millions of other "concepts", that is, classes of images, that have no direct interpretability in human terms. The "cat neuron" was a fun gimmick, but you're reading way too much into it.

That's a semantic argument more than anything. A small furry mammal with four legs, a long tail, whiskers, and pointy ears is what we'd call a cat, no matter what word you assign to it.
The thing is the network didn't learn the features you described. Take a simple example : neural networks confuse leopard print couches with leopards. Why? Because the network learns discriminative features based on the data it has. Theres not shared concept saying "oh this is an animal with for legs".
Okay, so then ANNs can learn the "concept" of leopard print. I still think that's interesting.
It's not a merely semantic argument. Google's system did not learn that the sound 'cat' is associated with that particular concept. You need some kind of supervised learner to make that association.
My point is that you can still identify that it's a separate concept, even if you don't know what to call it. Even simple unsupervised learners (clustering) can do this.

It reminds me of a story Feynman told: https://haveabit.com/feynman/knowing-the-name-of-something/

Merely identifying it as a separate concept is not especially useful. Tagging an image with the 'cat' tag is useful; tagging it with the 'concept 50765' tag, not so much.
Well... sure, not as useful, but I still think it's interesting. For instance, in english we have multiple words (goat, sheep) for what in chinese is a single word (yang2). If an unsupervised model split our mammals which have fur and bleet into two categories 'concept 19281' and 'concept 19282', we might think that it's done well to separate the goats and the sheep, but the chinese speaker might think that it's failed to group the same animal together.

Now imagine that reversed, that what we considered one thing could be considered 2 or more by the model, we had just never thought of them separate because we had no words to describe them.

There are many of these examples, where one language has one concept that's split among others in another language, and the speakers of the first language might never know the difference unless those words exist.

A good example is colors: https://eagereyes.org/blog/2011/you-only-see-colors-you-can-...

It may have used tags on YouTube videos to identify which had cats. Not sure if that counts as completely unsupervised.
Nope. Here's the link to the paper Google team published in 2002.

Building high-level features using large scale unsupervised learning http://arxiv.org/abs/1112.6209

From the abstract: Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.

The Cat detection thing was just a side product of learning to identify features of things in an unsupervised manner, but the news outlets locked on to that with titles such as "How Many Computers to Identify a Cat? 16,000" in NY Times.

Wasn't it amazing that they could distill the concept of cat from images with no help from external labels (human intervention)? They missed the core of the discovery by not understanding that.

The deep learning method is an unsupervised way to process raw input and transform it into useable features. This used to be done by a combination of domain knowledge and supervised training, but they could build an automated way to extract relevant features from images.

This opened the window for hope that one day neural networks will be easily applied to any new domain if there is sufficient raw data to build a deep network for it. In the past there was a need for a large investment in human based data labeling and how to extract the best features from raw data (also described as voodoo magic by the same researchers - it was hard, it was domain locked and expensive).