Hacker News new | ask | show | jobs
by mswen 4521 days ago
Becoming a data scientist isn't a matter of reading journals and blogs. You can get a sense of the field and what is required by reading those sites but becoming a data scientist is years of hard work.

You need to develop serious skills in at least 4 of the following disciplines. Statistical analysis

RDMS query development

NoSQL databases

Machine learning

Natural Language Processing

Web crawling and data harvesting techniques

Programming to access data APIs

Web development

Data visualization

Systems in business that generate data including, CRM, ERP and more

Geospatial data systems

Each of these areas would have its own set of resources both formal and informal.

1 comments

Well that’s just, like, your opinion, man.

I’m not a “data scientist” (or statistician, for that matter), but of the (excellent) data scientists that I know, the only specific skill they really have in common is statistical analysis. I’d say the truth is probably closer to “statistical analysis + ability to do independent research + computational chops using whatever their tools of choice may be"

As a data scientist, I have to agree with his opinion.

Usually you have a team where each person is "specialized" in a few of those categories.

You can call a data scientist a statistician, but I don't think you can necessarily call a statistician a data scientist.

The truth is, you need only a shallow understanding of machine learning and stats to be a data scientist. But you also need the know-how to collect data - this ends up being the much bigger issue to tackle in my environment. (For what it's worth, you need to have a strong understanding of how data points relate to one another, how accurate they might be, why they might not be accurate, and you also need to be constantly thinking about the long term vision for your data.)

Agreed it is just my opinion. And rarely, if ever, will you find all of these skills in the same person. More often it will be a small team of people each with 1 or 2 specialties plus some other areas they are reasonably competent at.

Most of what I have been reading on the topic seems to define data science as the intersection of the kinds of things I have listed. I guess my larger point was that each of these areas have their own learning curve and some like statistics or machine learning benefit from formal training. A person does not become a data scientist by reading blogs and journals.