|
|
|
|
|
by sanderjd
1031 days ago
|
|
Super hard to say, so ... 80th or 90th? With very low confidence. But I do have very high confidence that the 99th percentile is much larger than petabytes (think: what's next after "exa"), and I believe that many companies these days crack into "peta" territory. But as I saw another comment mention, I think another, probably more important, consideration besides size in bytes is cardinality and structure. So maybe this whole classification we're doing is kind of beside the point :) |
|
I also agree that the variety of data plays a big part in its complexity. If you have a few petabytes of data, but it's really only a handful of tables you can real hone in on the relationships. If it's a wide array of sources with many tables between them then you have some nasty problems like entity resolution.
All happy data sets are alike; each unhappy data set is unhappy in its own way.