Hacker News new | ask | show | jobs
by deathanatos 3232 days ago
I disagree.

* I routinely find people using the wrong data structure, when there exists a better one, with better O() time/space.

* I find people tend to not understand BTrees, particularly when there are two attributes being indexed. Given an index on (a, b), I find it common misconception that the BTree can efficiently answer `$a_min < a < $a_max AND $b_min < b < $b_max`. (I.e., people do not understand that the tree cannot make use of the second < condition, and must scan potentially many more rows than they intend.)

* Graph theory. git uses it. Any sort of dependency tree uses it.

That said, I acknowledge that software engineering does require a lot of non-theoretical knowledge, which is why I ask both types of questions in an interview.

5 comments

That's where I disagree. For 90% of the code it wouldn't ever matter if you're using a list with O(n) or an array with O(1). And in the 10% it can usually be remedied afterwards. Usually the reviewer will point it out and it's a 10min fix.

On the other hand I've seen so much hand-optimized code that used the correct data structures to solve the wrong problem.

Yes, that is kind of unfair - you want people who grasp the problem and find a good solution. But I prefer the people who solve the problem at hand in a wrong way instead of solving the wrong problem in the best imaginable way.

> Usually the reviewer will point it out and it's a 10min fix.

And in order for the reviewer to recognize the situation at all, and then point it out, they would need knowledge of the underlying data structures, their performance, and as you mention, pragmatism to know when to use it.

(I'm presuming that everyone on your team shares the responsibility of code review, and you don't funnel it through a few people with knowledge of how things work. (I believe that robs those that don't of the opportunity to learn through review.))

You're absolutely right. But there's a difference between "spotting an error" and "being able to construct $datastructure_I_use_once_a_year" in an interview
Yeah, but data structures and binary trees are about as much as you really need to get. There's a big difference between selecting the right data structure and re-implementing dijkstra's work on a whiteboard.
Counterpoint - I routinely see people ignore the actual real life cost of pointer chasing because O(n) (you don't need a linked list if you only have 10 small elements).
I am assuming you are talking about database indexes. The documentation for the database that I have most used actually specified what the various supported types were good for. It made choosing the correct type of index much easier. The documentation may be sufficient for determining which index types will perform well for the expected queries, even if knowing detailed information about data structures is nice to know. Having documentation which describes the data structures well enough is really nice though, and should be encouraged a lot.
Does your job actually require knowledge of BTrees?
Yes. They're the typical data structure backing most relation database indexes. Knowing when they will perform well (and more often, when they won't) follows directly from knowing their structure. (The example of trying to do a range search on two dimensions in the post above is an example that doesn't perform as well as — again, I find — people naively expect it to.)

Have I ever implemented one? Not yet. Do I ask for a BTree implementation or exact, low-level understanding on an interview? No.

Yes! Our database blew up in our face because we misused and had wrong performance assumptions some indexes. Colleague had knowledge of btrees (the underlying data structure of database, iirc) and database internals. Optimised the whole thing. Queries are now 500x faster