Hacker News new | ask | show | jobs
by datastoat 1583 days ago
Big enough data that you can afford not to use some of it for training! Different disciplines hit this threshold at different times -- language and speech much earlier, as you say; clinical trials not there yet.

Maybe we could talk about two cardinalities of "big" data. The first is when you can afford not to use all of your data for training. The second is when you can usefully fit highly overparameterized models.

1 comments

To be fair, there's a psychology paper from the late fifties that suggests this approach. Much like the early days of double descent, this didn't attract the attention it deserved at the time.