| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sanderjd 1031 days ago
	I guess I'll say what I think. I do think it is targeted at that smallest 80% of companies with some digital footprint, and also at most of the top 20%. Or more specifically, I think maybe it's targeted at like the 5th percentile to the 99th percentile. That bottom 5% probably just needs Excel, and that top 1% is probably writing or heavily modifying all their own tools. But I'm not sure the advice is very good from the 5th percentile up to ... maybe that top 20%? A lot of the stuff in the article assumes the availability of sophisticated data architects and mature infrastructure groups that I really don't think the median company has.

1 comments

agent281 1031 days ago

I agree. Really seasoned data people are not common enough. Small companies need to buy services to lighten the load.

We both seem have a sense of the size of companies at different percentiles. At what percentile would you put your company with petabytes of data?

link

sanderjd 1031 days ago

Super hard to say, so ... 80th or 90th? With very low confidence.

But I do have very high confidence that the 99th percentile is much larger than petabytes (think: what's next after "exa"), and I believe that many companies these days crack into "peta" territory.

But as I saw another comment mention, I think another, probably more important, consideration besides size in bytes is cardinality and structure. So maybe this whole classification we're doing is kind of beside the point :)

link

agent281 1030 days ago

Yeah, it's hard to say with any certainty. I agree that the far end is the curve probably looks nothing like the "neighborhood" a couple percent away, relatively speaking.

I also agree that the variety of data plays a big part in its complexity. If you have a few petabytes of data, but it's really only a handful of tables you can real hone in on the relationships. If it's a wide array of sources with many tables between them then you have some nasty problems like entity resolution.

All happy data sets are alike; each unhappy data set is unhappy in its own way.

link

sanderjd 1030 days ago

> All happy data sets are alike; each unhappy data set is unhappy in its own way.

Ha, gonna steal that for some doc I write someday :)

link

agent281 1030 days ago

That's only fair: I stole it from Anna Karenina. :]

https://en.m.wikipedia.org/wiki/Anna_Karenina_principle#:~:t....

link

sanderjd 1030 days ago

Ha I know, I love that opener, despite it being super cliche to love it. Things are usually cliches for a good reason :)

link