Hacker News new | ask | show | jobs
by bowlesbe 3507 days ago
Great point! I considering using fasttext as a baseline, however in practice fasttext really didn't work well at all with the small data set, much worse than the tfidf baseline. I think Fasttext's classification approach might not work well with such a small dataset. I'm not sure but I suspect its because it tries to learn embeddings - but there just isn't anywhere near enough data for that. I'd love an outside perspective on this.
1 comments

Fair enough. That's a useful comparison to know about.

But I'm wondering how you get around that with the neural net. In the post, you said there are only a few hundred labeled examples, right? How can a neural net with hundreds of parameters set those parameters to anything reasonable, and not overfit, when there are about as many parameters as examples?

Great question and I share your intuition but I think its all properly regularizing your model. I guess for neural networks, Dropout works really darn well as a regularization strategy. I could have tried to see whether performance dropped significantly without dropout.