Hacker News new | ask | show | jobs
by shawnhermans 3330 days ago
This whole conversation is frustrating because it is boiling down to a stupid semantic debate. The author is claiming that people don't get paid to explore data, they get paid to find things. IMHO, this statement doesn't even make sense. When I explore data, I almost always find something. This something might not be useful to an "end user," but it is almost always useful and necessary.

Sometimes, the only thing I find by doing exploration is that a particular dataset is absolute garbage and shouldn't be used for any purpose. The only way I find stuff like that out is if I explore the dataset.

2 comments

>The author is claiming that people don't get paid to explore data, they get paid to find things. IMHO, this statement doesn't even make sense. When I explore data, I almost always find something.

People are not paid to find "something", they are paid to find specific things.

Hence, the following makes even less sense that TFA:

>This something might not be useful to an "end user," but it is almost always useful and necessary.

In reasonably sized datasets, you'll typically find a lot of interesting information and relationships that are only loosely or not at all related to what the analyst is actually paid to do at the time.

Analysts who only find the specific thing and end their work on that are a dime a dozen, and need to be micromanaged. Good analysts will find all the other interesting stuff on their own and inform the business about it. Those good analysts are the explorers, and banning those people form exploring during training seems like an effective way to take talented budding analysts and turn them into mediocre ones.

In reasonably sized datasets, you'll also find a lot of spurious correlations simply by chance. That's one reason in science you're supposed to write down your hypothesis and methods of analyzing data before touching the data. Otherwise you risk finding some random noise and thinking it's important.
Probably you do find something, and then presumably go on to dig out the interesting bits and then present it. But it would not surprise me one second if students on an assignment stop when they have a tool for exploring with some dataset loaded. That would be problematic. Exploration is a means, a starting point. Not the final thing.