| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bowlesbe 3507 days ago
	Great point! I considering using fasttext as a baseline, however in practice fasttext really didn't work well at all with the small data set, much worse than the tfidf baseline. I think Fasttext's classification approach might not work well with such a small dataset. I'm not sure but I suspect its because it tries to learn embeddings - but there just isn't anywhere near enough data for that. I'd love an outside perspective on this.

1 comments

rspeer 3507 days ago

Fair enough. That's a useful comparison to know about.

But I'm wondering how you get around that with the neural net. In the post, you said there are only a few hundred labeled examples, right? How can a neural net with hundreds of parameters set those parameters to anything reasonable, and not overfit, when there are about as many parameters as examples?

link

bowlesbe 3507 days ago

Great question and I share your intuition but I think its all properly regularizing your model. I guess for neural networks, Dropout works really darn well as a regularization strategy. I could have tried to see whether performance dropped significantly without dropout.

link