| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by anxrn 1581 days ago
	Very much agree with the simplicity and power of separation of training, validation and test sets. Is this really a 'big data' era notion though? This was fairly standard in 90s era language and speech work.

1 comments

datastoat 1581 days ago

Big enough data that you can afford not to use some of it for training! Different disciplines hit this threshold at different times -- language and speech much earlier, as you say; clinical trials not there yet.

Maybe we could talk about two cardinalities of "big" data. The first is when you can afford not to use all of your data for training. The second is when you can usefully fit highly overparameterized models.

link

disgruntledphd2 1581 days ago

To be fair, there's a psychology paper from the late fifties that suggests this approach. Much like the early days of double descent, this didn't attract the attention it deserved at the time.

link