Hacker News new | ask | show | jobs
by absherwin 3548 days ago
Unsupervised refers to whether or not the dataset is being trained against anything. Think about the difference between: How many people will view this webpage? Divide these pages into 20 clusters? The first is supervised. The second isn't.

Deep learning refers to a particular type of a particular learning technique: Specifically a neural network that has many hidden (intermediate) layers. Deep learning can be used for either supervised or unsupervised learning.

2 comments

Thanks for the explanation! I was remembering what flavors of snake oil were being peddled to people in the humanities in the early 2000's, and didn't mean to muddy the waters around technical terms of art.
I agree with your sentiment, it feels out of place because deep learning, AI and big data are buzzwords, but unsupervised learning is a rather technical term in machine learning referring to a very specific class of problems.
Are they really buzzwords? To me, they have rather particular meanings (although I guess others may feel differently):

Deep learning: a particular type of artificial neural network with many hidden layers (and the associated tech to make this work/trainable)

AI: The field of computer science which aims to make computers smarter. Like most fields, there is much overlap with others, for example, statistics.

Big Data: A buzzword. About the best definition I can find is anything which has the 3 V's: Volume, Velocity & Variety. In general, outside its use as a buzzword, I think big data is generally thought of as "when you need a distributed system to process your data", be it because of volume, velocity or variety.

Supervised and unsupervised learning: whether or not you require example data for training

Machine learning: some people say its the subset of AI that deals with statistical methods, other people say its just another word for AI.

Specific only in that the categories aren't supervised.

Furthermore, suppose you have labels for some but not all points on your data (i.e. your model is designed to be robust in the face of things it hasn't been trained for). There are a nontrivial number of people who work on either side of the "semi-supervised" divide, e.g. clustering with examplars or pulling out the generative model for a discriminative task. Personally I like these better, as they're more akin to what people seem to actually do (encounter new things and try to make sense of them).

Anyways. If you look at the delta in performance between "old" techniques like random forests or gradient boosting vs. deep convolutional networks, it tends to be quite small until your datasets grow to very large sizes. For things like images that's not much of a problem. For things like rare diseases it's a huge problem.

Deep learning isn't a buzzword, it has a fairly precise definition. It describes a particular class of algorithms that happens to be the state of the art for many problems.