Hacker News new | ask | show | jobs
by apathy 3548 days ago
Specific only in that the categories aren't supervised.

Furthermore, suppose you have labels for some but not all points on your data (i.e. your model is designed to be robust in the face of things it hasn't been trained for). There are a nontrivial number of people who work on either side of the "semi-supervised" divide, e.g. clustering with examplars or pulling out the generative model for a discriminative task. Personally I like these better, as they're more akin to what people seem to actually do (encounter new things and try to make sense of them).

Anyways. If you look at the delta in performance between "old" techniques like random forests or gradient boosting vs. deep convolutional networks, it tends to be quite small until your datasets grow to very large sizes. For things like images that's not much of a problem. For things like rare diseases it's a huge problem.