Hacker News new | ask | show | jobs
by astrophysician 1989 days ago
Just want to say that while the data science profession definitely includes a wide range of people and skillsets, a good data scientist should be practical and able to work with the available data in whatever state it's in.

No good data scientist should ever expect data to be pristine. And a good data scientist, even if they don't have quite the engineering chops necessary to build a production-quality ETL, should know enough about the process to help guide it. If they aren't a part of that process, they're not being a good DS. They can't expect someone not involved with their problem to know what tradeoffs to make, and if they don't know exactly how their data went from raw form to the ETL-ed form, they're probably going to make bad assumptions, and those assumptions may very well make their architected solution a complete pile of garbage. Not to mention, how can a DS offer suggestions for solutions if they aren't deeply familiar with the raw data that's available?

To me, a good data scientist should, at bare minimum, have several skills.

* They should first and foremost (but not solely) be an in house expert in statistics and machine learning to know what can be done with data, and what can't be done with data. They should arrive with that knowledge. Engineers I think have a tendency to trivialize this, but true expertise in this domain comes only with years of experience.

* They should strive to find modeling solutions that are right for a particular business problem. If they seem to be only applying the hottest research regardless of the tradeoffs for the particular business problem, that's a red flag.

* Their focus should be on integrating themselves with the product/business as much as possible, and with the engineering team as much as possible. If they're expecting to be handed directives, that's a recipe for a ton of wasted time.

DS should never, ever be siloed into their own little DS world. They will be useless without a deeply intimate knowledge of the business goals, the needs of product, and the capabilities of the engineering team.

As they progress, they should become more and more "full-stack", otherwise they are stagnating.

1 comments

A good data scientist should also be good at science. Otherwise, you can simply hire people with engineering skills - you don't need scientists. If you hire scientists and then are surprised they aren't good at engineering, the hiring process needs a reality check.
Statistics is a science as well. Unfortunately it’s overloaded in business terms and can mean anything from “knows means and regressions” to “has a copy of _Meyn and Tweedie_ on their shelf”.