Hacker News new | ask | show | jobs
by agent281 1030 days ago
Yeah, it's hard to say with any certainty. I agree that the far end is the curve probably looks nothing like the "neighborhood" a couple percent away, relatively speaking.

I also agree that the variety of data plays a big part in its complexity. If you have a few petabytes of data, but it's really only a handful of tables you can real hone in on the relationships. If it's a wide array of sources with many tables between them then you have some nasty problems like entity resolution.

All happy data sets are alike; each unhappy data set is unhappy in its own way.

1 comments

> All happy data sets are alike; each unhappy data set is unhappy in its own way.

Ha, gonna steal that for some doc I write someday :)

That's only fair: I stole it from Anna Karenina. :]

https://en.m.wikipedia.org/wiki/Anna_Karenina_principle#:~:t....

Ha I know, I love that opener, despite it being super cliche to love it. Things are usually cliches for a good reason :)