Hacker News new | ask | show | jobs
by rkaplan 3298 days ago
This post doesn't even mention the easiest way to use deep learning without a lot of data: download a pretrained model and fine-tune the last few layers on your small dataset. In many domains (like image classification, the task in this blog post) fine-tuning works extremely well, because the pretrained model has learned generic features in the early layers that are useful for many datasets, not just the one trained on.

Even the best skin cancer classifier [1] was pretrained on ImageNet.

[1]: http://www.nature.com/articles/nature21056

9 comments

This is how the great fast.ai course begins - download VGG16, finetune the top layer with a single dense layer, get amazing results. The second or third class shows how to make the top layers a bit more complex to get even better accuracy.
Saving someone a Google.

http://course.fast.ai

Can't recommend the course highly enough!
I'm skimming through the content and it seems really great! I'm interested in the last lesson (7-Exotic CNN Arch), but I'm afraid of missing other cool stuff in past lessons.

What do you suggest for someone who has experience with Deep Learning?

EDIT: found this wiki with the course notes: http://wiki.fast.ai/index.php/Main_Page

One can use it as a guide to avoid missing anything.

I think one of the strengths of the course is that Jeremy shows parts of the process of working on a ML problem. If you have time, I recommend watching earlier lessons, even if you know the theoretical aspects of the content covered.
Same!! On lesson 2 and already feel like I know so much! The top down approach to learning is great! Will actually read the book "Making Learning Whole" that inspired them to follow this approach.
The 30 minute overview has me excited about going through it. I really like how they have structured it. Thanks to all for recommending it!
Totally agree.

Similarly for word embeddings like word2vec, GLoVE, fasttext etc in the case of NLP.

I think this is fundamental - if you teach a human how to recognize street signs, you don't need to show them millions of examples - just one or a few of each is enough because we build on reference experiences of past objects seen through life experience to encode the new images as memories.

Would you care to elaborate?

My main doubt comes the fact that language meaning may vary between different contexts, but I am no expert and am earnestly curious about using NLP and ML with not-that-big data.

Vector representation are useful for many natural language processing tasks. In word embedding like word2vec, GLoVE, fasttext for each word the algorithm learns an associated vector in dimension n (x1, x2, .., xn). A word maybe close to another in one dimension (or a subspace) but far away in another one. Moreover good representation allows meaningful vector space arithmetic: Queen - Women == King word representations are typically trained on very large unlabeled data, but once the algorithm learns the features you can use them for your small dataset. EDIT: Add more explanations.
This blog post[1] is a great example of positive unintended consequences of deep learning:

> We were very surprised that our model learned an interpretable feature, and that simply predicting the next character in Amazon reviews resulted in discovering the concept of sentiment.

[1]: https://blog.openai.com/unsupervised-sentiment-neuron/

It is mentioned at the end of the post:

> You don’t need Google-scale data to use deep learning. Using all of the above means that even your average person with only a 100-1000 samples can see some benefit from deep learning. With all of these techniques you can mitigate the variance issue, while still benefitting from the flexibility. You can even build on others work through things like transfer learning.

To be fair to the original article, his assertion was more along the lines of you can't train a deep net without lots of data. As the second article shows, that isn't true in the general case. However, it is certainly true for creating any of the interesting models you think of when you think of deep nets (i.e., Inception, word2vec, etc.). You just can't the richness of these without a lot of data to train them.
Deep learning can do pretty well when its not pre-trained as well.

I have this data set that is word counts for top 5k words, 5000 observations training, 5000 hold out. I consider this data pretty small.

SVM with rbf kernal can get around 87-88% accuracy, but a histogram kernal can get around 89.7% accuracy with a little feature engineering.

Tensorflow, after tuning some parameters, can also get around 89.7% accuracy as well.

Transfer learning is efficient (minimal training time) and useful for most classification tasks across various domains. Some of the first models I've used were built on Inception/ImageNet and I recall being thoroughly impressed by the performance.
This only works though if the pretraining and your training, both are with the data in the same domain. Even in that you will have issues if the data was from same domain but different in representation, e.g. 2D and 3D image datasets.
What would you consider the best resource for learning how to do this in Python? I have a smaller set of image data where I'd like to identify the components of (i.e. 'house', 'car', etc.).
Watch the very first 1 or 2 videos from http://fast.ai . They show how to do this in about seven lines of python and five minutes worth of model training.
Does transfer learning apply to seq2seq as well ?