Hacker News new | ask | show | jobs
by apohn 3160 days ago
>40% of people survey spend >1-2 hours per week searching for another job. Surprising given all companies complain about the difficulty in finding data scientists/machine learners.

I've hired data scientists in the past. One thing I found is that a lot of interviewees want to talk about all the algorithms (e.g. Gradient Boosting) they've used and are not able to describe how they thought through the problems before they applied the algorithms. It's easier to find somebody who downloaded some mostly clean data, then copy/pasted some code than a person actually thinks through the quantification of a problem. There are a lot of buzzword artists out there.

This is important because in a lot of organizations the business problems have not yet been quantified in a way that lends itself to getting meaningful and valid results from an algorithm. The Data Scientist has to be able to work with others to quantify a problem. Or at a minimum, recognize that there are issues with the current way the problem is quantified and think of ways to improve it. It's much easier to teach somebody to run a data algorithm than it is to actually understand a business problem.

There are issues with people doing the hiring as well. In my last job (not a software company), the VP of the group had pushed to get headcount for a data science team and was fearful of making the wrong hire because he didn't want to say "We hired a data scientist at 2X-3X the cost of a Business Analyst and that was a bad hire." The end result was a massive amount of paralysis, an insanely long and convoluted job description, and complaints about the hiring pipeline.

1 comments

I agree with your assessment that a lot of times the business problems have been put into a form that lends itself to exploitation by machine learning. Sometimes a company has a lot of data that's actually useless.

Most of the time, I've found that business people do not understand the value of data. Often I hear, "we have this data set, let's unleash the data scientist on this to tell us something." or "we have this data set but what are the so-whats here?".

I spend a lot of my time explaining that there must first be a business objective, a key question, or hypothesis that can then be understood through data. I cannot take a haystack and find the needle that is interesting to you. And if I do find that needle, many times there are no resulting changes made to our strategy.

I think we're still in a place where the value in a data scientist is not that she knows how to write: fit <- lm(target ~ ., data = customers)

The value exists when she can take a problem from the business, understand how to find a solution with data, and then convey that back to the business in a meaningful way that allows them to easily understand how they can make changes to positively impact the bottom line.

>I spend a lot of my time explaining that there must first be a business objective, a key question, or hypothesis that can then be understood through data. I cannot take a haystack and find the needle that is interesting to you. And if I do find that needle, many times there are no resulting changes made to our strategy.

IMO a number of data science positions should be considered partly research positions. You are hiring somebody think critically about how to generate high value/impact from data. This includes exploring if there is a different way to think about a business problem than it has been formulated in the past. This may include defining and collecting data when you discover the existing (or non-existent) data isn't appropriate. As with any research, you'll sometimes realize the path you are on is wrong and a correction is needed.

The "find all the needles in this haystack" is a totally different worldview and throws a lot of critical thinking out the window. I think this really plays into the idea that an organization can hire a person who is going to do immediate "magic" with algorithms and zero effort beyond that. You can slice/dice and p-hack your way into a million thoughtless and useless "insights."

The organizations I have seen that do best at this have teams of data scientists collaborating with devs/engineers and business analysts... there need to be a lot of different research activities going on most of which are working off the same data/compute infrastructure but with some people dissatisfied and pushing the edge of course. Also regarding hiring pipelines I would discourage hiring based on technology keywords as anyone that is a good fit should be intelligent and curious enough to pick up their new employer's tech stack relatively quickly.