Hacker News new | ask | show | jobs
by minimaxir 1622 days ago
The hard questions in DS/ML interviews I've received over the years aren't the theory questions (which I rarely get asked), but the trick SQL questions that often depend on obscure syntax and/or dialect-specific features, or "implement binary search" when I'm not in the mindset for that as that isn't what DS/ML is in the real world.
2 comments

I think they're fine as long as you know the format and have an opportunity to prepare or just get in the right mindset for it. And some things (like binary search) should be easy to write anyway.

The SQL questions can also be a symptom of the type of job - Facebook's first data science round focuses a lot on SQL but that's because it's a very product/analytics/decision-making focused role without that much coding or ML. With data science you have to be more careful about these things when searching for a job; you can't just use the job title as a descriptor.

> And some things (like binary search) should be easy to write anyway.

It's a different story when a) your mind is set on statistics/linear algebra b) you've never had to actually implement binary search by hand since college and c) even if you do implement the algorithm and demonstrate that you have a general understanding, it must work perfectly and pass test cases otherwise it doesn't count.

FWIW I was rarely asked about algorithmic complexity which is more relevant in DS/ML, albeit it's usually in the context of whiteboarding another algorithm and the interviewer mocking me for doing it in O(n) instead of O(logn).

Binary search in particular is surprisingly tricky, which is precisely what makes it useful for telling if someone knows how to program. To a significant extent, though, you can cheat by studying binary search itself, which is a surprisingly beautiful thing.

I like this formulation for finding the first index in a half-open range where p is true, assuming p stays true thereafter:

    bsearch p i j :=
     i                   if i == j else
     bsearch p i       m if p m    else
     bsearch p (m + 1) j
     where m := i + (j - i)//2
Or in Python:

    def bsearch(p, i, j):
        m = i + (j - i) // 2
        return (i if i == j
                else bsearch(p, i, m) if p(m)
                else bsearch(p, m+1, j))
The only tricky thing about this formulation is that m < j if i < j, thus the asymmetric +1 in only one case to ensure progress. If invoked with a p such as a[m] >= k it gives the usual binary search on an array without early termination. The i + (j - i) // 2 formulation is not needed in modern Python, but historically an overflowing (i + j) // 2 was a bug in lots of binary search library functions, notably in Java and C.

(Correction: I said a[m] <= k. This formulation is less tricky than the usual ones, but it's still tricky!)

> Binary search in particular is surprisingly tricky, which is precisely what makes it useful for telling if someone knows how to program.

That's the problem. There are many other ways to do that without risking false negatives and annoying potential candidates (e.g. I would not reapply to places that have rejected me due to skepticism about my programming abilities and using tests blatantly irrelevant to day-to-day work because it's a bad indication of the engineering culture).

Even FizzBuzz is better at accomplishing that task.

FizzBuzz (or equivalent) is actually great IMO. It weeds out the people who lied on their resume, without punishing the people who never learned CS because they were too busy learning things that were actually useful to DS, like statistics or data visualization.
I've actually been given fizzbuzz in a DS interview! Up to that point I thought that fizzbuzz was just a meme because it's obviously too easy.
There are levels of not knowing how to program that go beyond FizzBuzz. But sure, many programming jobs don't require them.
If that's the case for the DS/ML domain, then a short take-home exam should provide a better example of practical coding ability (the common counterargument that "take-home exams can be gamed" is a strawman that would be more on the interviewer's fault for creating a flawed exam).

In my case, I typically got the "implement binary search" questions in a technical interview after I passed a take-home exam, which just makes me extra annoyed.

Facebook Product Data Science has always been a Product Analyst role more than anything else. I did the interviews a while back, and it was a pretty fun experience, but it's not what a lot of people call data science.
> but it's not what a lot of people call data science

I think that's changed a bit over time and the term has expanded to mean more things. In addition to Facebook, another great example is this article from Lyft in 2018 where they say that they're renaming all their data analysts to data scientists and all their data scientists to research scientists - https://medium.com/@chamandy/whats-in-a-name-ce42f419d16c

This is called title inflation.

Like the hilarious thing about Facebook and Data Science is that the term was invented there, and they needed to retitle all of their analysts (like Product Data Science) as they couldn't hire any analytical people with an analyst title in SV (or so I have been told).

Like, data science was defined back in the days as a social science PhD who could run experiments and write MapReduce jobs. I'm pretty sure that most people would disagree with this definition these days.

Yes. My point is that the person running experiments and mapreduce jobs is still called a data scientist. But so is the product analytics person (btw product data scientists run experiments too). And there are some other data scientist job profiles too (more focus on research, more focus on engineering etc). So it's not really a complete redefinition of the term, it's more of an expansion of the types of jobs it covers.
In my experience, it varied greatly from team to team.
I had an "implement binary search" interview once. I came away feeling like I was being interviewed for the wrong role. I don't understand how anyone could think that's an appropriate interview task for a DS position.
I'm an MLE and I get asked much harder questions than that. Implement a binary search seems ... fine?
Implementing anything even a little tricky under pressure can be tough. unless you’ve practiced with bit or pointer twiddling regularly, you are mostly validating whether they did interview prep or not. That probably selects for more serious candidates, so it probably works. But i was tripped up by a simple binary search problem the other day, even after i’d just solved several harder problems quite quickly. It’s just the nature of algorithmic problem solving — until you’ve done a lot of prep, it’s dicey whether a novel problem will take me five minutes or five hours to solve.
But it makes sense for MLE! IMO you should ask a stats or probability question in a DS interview.
The distinction between the two roles isn't that clear. Some data science jobs are very focused on engineering.
Agreed. MLE in very ML-heavy companies tends to mean SWE who work on ML systems, and sometimes, that can mean as much working on stuff like infrastructure as modeling.