Hacker News new | ask | show | jobs
by nabla9 3988 days ago
Roughly 80% of data scientists I know have PhD in something very math heavy. Rest have masters degrees. There are programmers who can assist them doing the grunt work but it's just basic programming to assist analysts to crunch data.

If you want to do data science for real:

1. Get Masters of PhD from statistics, computer science, economics, physics or some other heavy field and specialize data analysis in that field. You must learn lots of statistics when doing so.

2. Learn programming, statistical machine learning and tools of the trade.

Good data science is not based on collecting large amounts of data passively and then mining it mindlessly. You need to ask right questions and design data collection and modeling process based on those questions.

5 comments

I've seen a huge range in the people calling themselves data scientists. Some have very analytically intensive academic degrees, others just finished a data science boot camp, and there are a lot of people that used to be called 'business analysts' who are basically doing the same job with a fancier title. In every group, I've had people tell me that what they're doing is really data science, because data science needs the (academic|integrated|business) perspective that they have, and what the other people are doing isn't really data science.
> Good data science is not based on collecting large amounts of data passively and then mining it mindlessly. You need to ask right questions and design data collection and modeling process based on those questions.

This resonates. That is, picking and designing features. Also understand dependent variables and knowing how to test for that, which is the biggest mistakes leading to flawed conclusions I see from the 'general public'.

What do you mean by testing for dependent variables?
Maybe something to do with instrumental variables? https://en.wikipedia.org/wiki/Instrumental_variable
Academic credentials aren't enough, good data-driven decisionmaking is as much an art as an academic discipline. A p-value of .01 is a Nobel Prize in medicine and unpublishable in physics -- domain knowledge is important to have a feel for the difference.
Assuming that smart autodidacts can't obtain sound statistics knowledge is selling many people short.
I think you are right in that it sells many people short, but then again having no good academic credentials is selling yourself short.

Data science is not like security. There it is more accepted that good engineers/researchers do not necessarily have the best accreditation. It seems that data science/engineering is turning around to this though.

It's not that autodidacts can not build bridges, it is that the people with the data and money do not want their bridges build by autodidacts.

Anyway... back to studying http://statweb.stanford.edu/~tibs/ElemStatLearn/ for me :).

No. A phd in statistics or economics means almost nothing at this point. Even if it did, truly, signal mastery of the content, which it doesn't anymore, it would signal to most people who do this kind of work that you're way overqualified while simultaneously being totally ignorant of the day-to-day work of actual data scientists.

If you want to be a useful data scientist, do a lot of work with data. If you have strong programming skills and are flexible and a quick learner then you will do well.

Spending the better part of your young adulthood getting a phd in statistics, unless you want to go into academia, just makes you look like a fool.

There absolutely are problems that require a more rigorous mathematical training than you get from undergraduate courses or day-to-day experience. Most data scientists and companies may not be tackling these problems, but they certainly exist.

Just having a PhD will open doors for you that would otherwise be shut. But before pursuing that degree, you should be confident that you enjoy working in the field and want to devote your career to it. Also, you have to be prepared to work hard, not just to get the degree, but then to land a job where you'll put that experience to use. Otherwise, you'll be sharing a cubicle with DataWorker and feeling like a fool.

That said, if you don't know whether you need a PhD, that means you probably don't know what kinds of problem you want to work on. And in that case, there's a good chance you'll end up working on a problem that only interests your advisor and nobody else (most PhD advisors have more students than they have good problems to work on). In that case, I wouldn't recommend it.

I've had complete opposite experience. Do the people who hire for this kind of work often bet on non-PhD candidates? Do they trust themselves to separate the wheat from the chaff?

Don't you want a colleague who is able to mention seminal papers for specific problems? Who is able to read and understand these papers and can distill useful features and optimizations from them?

People with PhD who go into business, usually end up in the better positions. They hire other PhD's for the good positions to keep the signal (mastery of the content) stronger.

As someone who did a lot of work with data I have little problem with my usefulness, but a lot of problems opening doors to the really interesting data companies (lacking a proper academic network). I wish I had gotten that PhD, because right now applying to Google, Microsoft, Facebook, Yahoo or eBay for data science positions makes me look like a fool.

I've met a lot of fools who've quoted all the right works, in both Computer Science and Data Science. Computer Science fools usually get fired. Data Science fools seem to get promoted to Yes-man status. It's a lot harder to lie about your code than it is with statistics; As the old adage goes, it right behind Lies and Damned Lies.
You've said "No", but you haven't countered the posters claims. Are in fact most data scientists PhD/Masters people? I hear the same information at a mid-sized tech company. I also hear similar things about Intel.
>Spending the better part of your young adulthood getting a phd in statistics, unless you want to go into academia, just makes you look like a fool.

This is why you are a DataWorker, and not a dataScientist.

Anyone can push bits around. It takes a trained mind to corral them using careful experimentation and observation.