| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by imh 3924 days ago
	Wouldn't this require larger datasets? That isn't always an option. I'm imagining that a smaller, more computationally efficient network could learn nearly as well with fewer data points given these heavily engineered features. Is that off base?

1 comments

He gets pretty amazing results with a corpus size around 10M.

But that takes ages to train!

So something like Jason Weston's state-of-the-art attention-NN based sentence summarizer took ~4 days to train.

You'd easily spend that time doing manual feature engineering just to build a baseline system.