Hacker News new | ask | show | jobs
by searine 3988 days ago
Moving data around is just grunt work.

Real science requires a creative and critical mind, which takes years to mold.

1 comments

Sounds like you've also spent years molding professional disdain for everyone who's not a Real Scientist.
No I've just seen too many people spin their wheels on "analysis" that is not hypothesis driven.

You got to start with questions to get answers, and the hard part of science isn't crunching data, it is asking the right question!

And how does the Right Question appear if not through exploration and manipulation of the data?

Theory can obviously be very useful, but much of this stress on advanced statistics and phds is just a smokescreen for academics who suck at programming.

If you can't program and manipulate data, statistics won't save you because you won't have the ability to dig deep enough to find valuable insights. On the other side, if you know how to slice and dice data quickly and reliably, you can learn a huge amount by applying only the simplest statistical techniques. Generally the simple techniques are better anyway because they make mistakes less likely and your findings are easier to communicate.

>And how does the Right Question appear if not through exploration and manipulation of the data?

Questions don't magically come out of a data set. Doing so is called a fishing expedition and usually results in boring, descriptive results which have no impact.

To answer impactful questions, you must go into your data collection with the questions in mind. To understand what questions to ask, you need a trained, critical, and creative mind. That is something you don't get from pushing bits.

>If you can't program and manipulate data

Programming, and manipulating data is easy. Almost every new statistician these days can, and does do this routinely.

What's hard is the years of intuition about what is meaningful and what is noise.

I know. It's hard to hear, and career programmers most of all hate to hear it, but its the truth.

Anyone I've ever heard say "programming is easy" is without fail a terrible programmer.

I'm not really sure how to respond to the idea that exploring a dataset isn't a useful way to help develop questions about it. It's only a "fishing expedition" if you have no idea what you're doing.

>Anyone I've ever heard say "programming is easy" is without fail a terrible programmer.

Development of a worldclass application, is difficult because of the complexity built into a program of large scope.

Knowing enough programming to competently move a data set around, is easy. Hell you could do most of it with just bash.

>I'm not really sure how to respond to the idea that exploring a dataset isn't a useful way to help develop questions about it. It's only a "fishing expedition" if you have no idea what you're doing.

Well I've seen a lot of it, in both science and business. People who spend a lot of time and money to generate a large data set simply because they lack a question to ask. They expect meaningful answers to just tumble out of it like mana from heaven, and end up confused and dismayed when the answers aren't impactful.

Fishing expeditions are looked down upon because they can only describe the data you generated. That is minimally useful, and can be done without grabbing a huge sample.

Good science starts with a question, then puts data to work to create new insight by removing confounding factors through careful design.