|
|
|
|
|
by teraflop
3345 days ago
|
|
Yeah, I think filtering is a big part of it. If you want to answer a statistical question about the entire dataset, then a random sample is probably good enough. If you want to drill down and do an analysis that only looks at a particular narrow slice of the data, then it's likely that the corresponding subset of your sample isn't big enough to be meaningful. (You can pre-filter or pre-aggregate before sampling, but that assumes you know a priori what types of queries you'll want to do.) |
|