Hacker News new | ask | show | jobs
by inverba 3960 days ago
> I am convinced (although I do not yet have enough data to prove it) ... If anyone had any actual evidence that might support this conclusion, I would be grateful.

This is the opposite of how useful research and/or data science works. Data should be taken as is and then learned from. It certainly should not be gathered in an effort to directly prove a conclusion that you are already "convinced" of.

It's very disheartening to see this as a comment on a data science article on hacker news... unless you're being sarcastic/ironic? This has to be sarcastic/ironic, right? Right? :(

3 comments

On the other hand, here we get special insight into the attitude/understanding of statistics common in university psychology programs, and hence a particularly damning look into the field itself. This is how the research works; you come up with a 'just so' gut feeling, and you look and look and look until invariably you come up with /some/ evidence for it and then publish.
> This is the opposite of how useful research and/or data science works. Data should be taken as is and then learned from. It certainly should not be gathered in an effort to directly prove a conclusion that you are already "convinced" of.

Get down off the pulpit. If the data said the opposite, I would instantly change my worldview.

But you are correct, in that my wording or attitude was incorrect. I should have said "I suspect" instead of "I am convinced" and instead of "might support this conclusion" I should have said "might support or refute this hypothesis".

Which is to say, don't scientists at least have a hypothesis in mind before they collect data related to it? Otherwise, why would you be testing at all, and for what exactly? You can't just take millions of data points, put them into a blender and get proven theories out of it!

Anyway, that's what I'd have at this point, a hypothesis. I should have used that wording, my bad.

And something that you suspect (or even of which you are convinced) but can't prove is a decent starting place for trying to find the data to prove or refute it.
Evidence doesn't fall from the sky. People create a hypothesis and they seek evidence to test it. Also, your snark is useless.
That's not really true -- there are multiple ways of approaching this.

One camp says "collect any data that might be relevant, and then begin looking at the data to try to figure out what the hypotheses should be"

The other camp says "formulate a hypothesis, and then find the data you need to test that hypothesis".

The problem with the latter approach in the social sciences -- or any setting with lots of unknown latent variables -- is that it's often possible to find some data set for which a given hypothesis holds with p < 0.05. So whenever there are a lot of latent variables, it makes a lot more sense to construct a high quality data set first, and then start hypothesis testing.

The problem with the former approach is that you really need to know "this set of data is probably really interesting / representative for an entire range of hypotheses about topic X", but that's often not clear from the outset. And it's often the case that for any particular hypothesis, there are lots of other data sets you might know could also be relevant.

In any case, whenever there are lots of unknown latent variables, cherry-picking data sets that confirm your hypothesis is a really good way to lead yourself astray.

My solution is to just avoid working in fields with lots of latent variables, but that has limitations was well :-)