Hacker News new | ask | show | jobs
by jghn 4140 days ago
Exactly. I have a lot of acquaintances who self-describe as data scientists. They really run the range all the way from "they poke around with Excel" (ie not only could I do that in my sleep, but you couldn't pay me enough to do soemthing that dull) all the way to "novel algorithm design at the frontier of the field" (ie I could take classes, train, be mentored, etc and never be able to do what they do).
2 comments

> Exactly. I have a lot of acquaintances who self-describe as data scientists.

My understanding is that "data scientist" is a term meaning "statistics person who lives in San Francisco"

Alternatively, "Does statistics, but on a Macbook"
I know you're joking, but Wizard (http://www.wizardmac.com/) for OS X is a really nice place to start before delving into SPSS, Stata, and R.
Or, another version, "programmer who read an inferential statistics tutorial online".
You're looking at this through a purely technical lens.

The guy who "pokes around with Excel" probably operates in a business context. He interacts with people who have no clue about data science, and is able to use the data to tell a convincing story. This can be dangerous if he doesn't know what he's doing, but 90% of things people want to use "data science" for are pretty trivial technically and probably can be done in Excel.

The guy designing a novel algorithm probably operates in a technical context. People like this tend to be very "in the weeds" and incapable of succinctly explaining their findings to people without the same context they have. This is a universal problem -- people who are extremely technically skilled often have trouble explaining their craft to, say, a marketing exec wanting to know how a certain characteristic is derived. In fact, the marketing exec will probably call in the Excel data scientist to translate.

Does this mean the guy designing the novel algorithm is somehow lesser? Absolutely not! But when you choose a deeply technical career path, you run the risk of losing the external context. This is why many companies have managers in engineering who aren't super technical -- they're technical enough to understand the jist of the concept, but their core skill is communication. If they're doing their job well, the engineers are left alone to do their job without senior business people sticking their noses in everything.

Coincidentally (or maybe not), I think the "soft skills" are sorely missing in this skills matrix. Every engineer will have to give a presentation or work with an external team at some point in their careers, and some are better at it than others. In my opinion, the guys with hardcore engineering skills are great, but someone with solid engineering skills who can communicate well is a rock star. You can replace a badass engineer, but you can't easily replace the cross-team relationships that a good communicator has built that can often short-circuit requirements problems before they get turned into code.

Many things are simply complex and there is no way to 'dumb it down'. QM is probably the best example where the 'every man' description has almost nothing to do with the underlying theory.

In the CS encryption is probably the best example of this where the basic algorithm can be identical between a system protected from the NSA and something trivial to break for the average researcher.

Yeah, but your code is ultimately achieving some business requirement or it wouldn't be there. Being able to articulate that is an invaluable skill. Requirements are generally poorly written, so if something needs to be done a certain way (e.g. so the numbers in the reports across multiple products are consistent) then the business sometimes needs to know the algorithm.
I'm not implying one is inherently better than the other (well, maybe my inner tech bias is showing, but in objective terms I agree - they all have their purpose) but the point was that "data scientist" as a job title is effectively meaningless. If you tell me that yor'e a data scientist it means absolutely nothing to me because the range it gets used is so broad