Hacker News new | ask | show | jobs
by aledalgrande 1710 days ago
Yeah but depending on the data you might have even worse results, selecting the right subset to be representative is really important.
1 comments

Would a random sample be representative? Statistically this seems to be the case for any large N. In fact it's not clear to me that any other sample would be more representative.
Many public datasets have skewed classes so if you take a random approach you're not gonna have a good result. And N might not be big enough anyway.