Hacker News new | ask | show | jobs
by stuckagain 3399 days ago
I don't know about the average but at Google I probably write a new call to std::set::find twice an hour, and it's important to know that std::set is a red-black tree, and how std::set::find is implemented and how much that costs and so forth. Fluency with data structures and algorithms is not some kind of brain candy. It's absolutely necessary to get the job done.
5 comments

I've been a developer for 30 years, helped start a half dozen companies. Since college I've never written a single tree traversing algorithm, and would need to Google red-black tree to know what it is.

There is a whole world of development that is OS specific, UI specific, shipping commercial applications. Shipping quickly is always a higher priority than performance, because adequate performance is usually trivial to implement with hash tables/arrays/linked lists.

You don't generally need to write tree-traversal algorithms because other people have already done so and you can use their implementations. But you should still understand the tree traversal algorithm, so you know what the thing you're using is doing, what its complexity is, etc. And if you understand it, then you should be able to reimplement it yourself.
I'm not entirely sure adequate performance is good enough when you work at the scale of Google...
The performance bottleneck is not always where you think it is.
That's why Google measures.
That's right. Adequate performance will keep 1000's of extra cores burning cycles 24 hours a day.
>because adequate performance is usually trivial to implement with hash tables/arrays/linked lists

Are you sure about that? What if your code has to work for billion plus people correctly 99.999% time who are on all different kind of connections?

Correct is orthogonal to performant.

I would also argue that make a very clever algorithm do correct stuff (and be maintained to do that in the future) is more difficult than do it using simpler method/algo.

I understand the perfomance characteristics of red/black trees very well. I couldn't implement one in 30 minutes on a whiteboard.
After preparing and studying the interview questions on data structure, algorithms etc. with the help of guide books like https://books.google.com/books/about/Cracking_the_Coding_Int... You definitely can!
Neither could I, but that was also well beyond the scope of my Google interviews.
Yes but jeesus here we are discussing an algorithm that I first heard about in literally my first CS course on algorithms in my first year at the university, some 20 years ago. And I had to manipulate red black threes on the blackboard. Haven't seen them since in my 20 years of work in the industry.

I you need to write so performance critical code that the choice of tree structures matters then this interview question sure as hell isn't going to find out if you are cut out for it.

I've interviewed many candidates for one of the big 4 & sat on the decision loops too. 'Attribute substitution'[1] one thing that does stands out in these interviews. essentially interviewers substitute answering the larger question of judging the candidate's suitability for the job at hand with a simpler but easier to judge (& defend!) problem of solve this particular coding problem that I have 'normalized' over many candidates. Also, deep domain knowledge is often hard to judge.

1. https://en.wikipedia.org/wiki/Attribute_substitution

one does not need to know how to write an algorithm on a white board to understand it. Seems like it would better to ask what a red-black tree is, what's it's time complexity, and what would be a good situation to use one.

In fact, I would argue that one could know how to write one and have no idea of the practical use of it.

It pretty much has no practical use, that a different data structure couldn't do more efficiently.
What in the world are you talking about, RB trees are incredibly useful.
There's a well known analysis that shows they're asymptotically equivalent to B-trees. You can tune a B-tree to better exploit cache locality. Other kinds of ordered binary trees are far less complicated to code and have similar performance characteristics.
Maybe I am missing something but if you're using set::find so often, why not an unordered_set?