Hacker News new | ask | show | jobs
by jhdevos 3546 days ago
After interviewing many more and less experienced programmers (and asking most of them if they can explain a hash table - just for statistics!), I concur that most cannot.

There were quite some people that thought trees were faster than hash tables though - mostly the ones that had some incling of what a hash table does, but didn't know the entire picture. At least in all those cases I had the pleasure of educating them a little bit :-)

1 comments

> After interviewing many more and less experienced programmers (and asking most of them if they can explain a hash table - just for statistics!), I concur that most cannot.

I found this realisation pretty shocking myself. Several experienced Java developers told me they hadn't even heard the phrase "linked list" before when Java even has a class called LinkedList. :(

I see threads on here all the time about how interviews are broken and you shouldn't be expected to be quizzed on data structures if you're an experienced programmer but I don't agree with that. If you claim to be an experienced programmer and can't explain roughly how a hash table or a linked list works you've obviously got big holes in your knowledge in the areas of optimisation and scalability.

I see threads on here all the time about how interviews are broken and you shouldn't be expected to be quizzed on data structures if you're an experienced programmer

Anyone who thinks that an experienced programmer shouldn't know data structures just isn't programmer material: someone who truly appreciates this profession will have had formal education in algorithms and data structures, and if they do not, they will make sure they learn it on their own. Otherwise - this is how we get bloated crapware.

I just cannot up-vote your comment enough, if I could, I would up-vote it with my both hands and feet!

I always think back to The Art of UNIX programming:

Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust.

http://catb.org/~esr/writings/taoup/html/ch01s06.html#id2878...

> If you claim to be an experienced programmer and can't explain roughly how a hash table or a linked list works you've obviously got big holes in your knowledge in the areas of optimisation and scalability.

What is your definition of "roughly" and how do you think lack of that knowledge affects knowledge of optimization and scalability? I ask not because I necessarily disagree, but because your statement is very subjective. Also because I used to write large-scale real-time signal processing code that didn't use hash tables at all, so lack of knowledge there wouldn't have this effect on our code.

> how do you think lack of that knowledge affects knowledge of optimization and scalability?

I once wrote a log analysis python script. It took 2.5 hours to execute to completion. I then used Cython to compile it to C. It then took about 2 hours to run to completion.

I then stepped back and thought about the data structures I was using. I replaced some lists that were being searched with dictionaries that could were keyed by what I was searching my lists for. Replaced a few other simple data structures.

Now the script took, IIRC, 20 seconds to run to completion.

Now, I don't think you necessarily need to know all of the internal details of how a data structure is implemented to know its performance characteristics or when to use (or not use) them. However, the point is that by understanding their performance characteristics, code can be optimized or made more scalable.

Also, while something like a hash table might not be faster in actual real life terms (ie worse O() complexity can still be better wall clock; eg searching an array linearly can be faster than accessing a hash table since the array can be in cache and prefetched) when run on a single machine, big-O complexity can start mattering much more when you have a distributed workload or TB worth of data. (Although if you have a distributed workload, you have distributed systems problems to deal with too).

Having said all of this, without going into specific implementation details, hash tables and linked lists are incredibly simple and I see no reason why anyone calling themselves a programmer would have trouble with these. I mean, I wouldn't expect you to be able to write a good hash function or anything, but explaining the basic concept shouldn't be too hard for most programmers. If someone is self-taught, then ok maybe they haven't ever needed to learn this. Its definitely possible, but I would at least hope they would know basic performance characteristics or at least some rules of thumb as to when to use what data structure. Hopefully based on logic and not just hearsay ;-)

> Having said all of this, without going into specific implementation details, hash tables and linked lists are incredibly simple and I see no reason why anyone calling themselves a programmer would have trouble with these. I mean, I wouldn't expect you to be able to write a good hash function or anything, but explaining the basic concept shouldn't be too hard for most programmers. If someone is self-taught, then ok maybe they haven't ever needed to learn this.

Even if you're self-taught, if you've been programming for a while you should be able to read up on what linked lists and hash tables are in 15 minutes and virtually all interview preparation guides mention data structures. If you've not had the curiosity to do that then that's a bad sign.

> If you've not had the curiosity to do that then that's a bad sign.

Or you're just working from the backs of giants using built-in types. It's easy to do, when you haven't had to dig into low level code before.

I won't disagree with your premise, but it took some pretty extraordinary circumstances for me to find value in taking the time to learn about the ins and outs of linked lists and hash tables, because my every day work never required it.

It was reading an article about cuckoo hashes which finally prompted me to go down that road, and I can't regret it. That said, I've still never had to implement my own hash table for work.

> It was reading an article about cuckoo hashes which finally prompted me to go down that road, and I can't regret it. That said, I've still never had to implement my own hash table for work

That's what I'm getting at, people that have the curiosity to find out more. Nobody should be implementing their own hash table but if you don't know about basic data structure and algorithm concepts you're not going to be sharp at coming up with new efficient solutions.

> Having said all of this, without going into specific implementation details, hash tables and linked lists are incredibly simple and I see no reason why anyone calling themselves a programmer would have trouble with these. I mean, I wouldn't expect you to be able to write a good hash function or anything, but explaining the basic concept shouldn't be too hard for most programmers. If someone is self-taught, then ok maybe they haven't ever needed to learn this. Its definitely possible, but I would at least hope they would know basic performance characteristics or at least some rules of thumb as to when to use what data structure. Hopefully based on logic and not just hearsay ;-)

That's why I was trying to establish what "roughly" means. This sounds like a reasonable definition.

>how do you think lack of that knowledge affects knowledge of optimization and scalability?

In any program that has to handle just tenthousands of items ocasionally, knowing the difference between arrays, liked lists and pointer lists, or between various forms of hash tables, various forms of trees and sorted lists makes a big difference. This kind of knowledge affects how any program scales with input size and should be one of the first things checked when optimizing.

And I'd argue that just memorizing big-O properties only gets you half the way: heapsort is theoretically superiour to quicksort (better worst case, more space efficient, same average case), yet quicksort is used far more often because it makes better use of cpu caches and has a worst case that barely ever happens.

> Also because I used to write large-scale real-time signal processing code that didn't use hash tables at all, so lack of knowledge there wouldn't have this effect on our code.

Is the fact that it doesn't use hash tables a coincidence or is it because the people designing and writing the system know their data structures and realized that hash tables don't fit the problem?

> In any program that has to handle just tenthousands of items ocasionally, knowing the difference between arrays, liked lists and pointer lists, or between various forms of hash tables, various forms of trees and sorted lists makes a big difference.

This is why I asked for clarification. To me "knowing the difference between" is different from knowing how they "work". Maybe it is just semantics in my head.

Edit because I forgot to address the second part:

> Is the fact that it doesn't use hash tables a coincidence or is it because the people designing and writing the system know their data structures and realized that hash tables don't fit the problem?

I suppose it is technically the latter, but it was glaringly obvious that hash tables weren't appropriate anywhere because none of our algorithms required associative lookups. Every algorithm had to touch every point of data it stored on every iteration anyway, so it was pointless to do anything but loop over it. Even if there had been a case where hash tables would have made sense, I'm not sure implementing a fixed-size hash table in C would have been worth the effort.

> In any program that has to handle just tenthousands of items ocasionally, knowing the difference between arrays, liked lists and pointer lists, or between various forms of hash tables, various forms of trees and sorted lists makes a big difference. This kind of knowledge affects how any program scales with input size and should be one of the first things checked when optimizing.

Exactly. If a developer is picking data structures without any consideration of their growth complexity they're going to cause serious issues.

Putting on my "get off my lawn" hat here:

> they're going to cause serious issues

But not until it's gone to production, and everyone's gotten an "attaboy" for completing their project and moved on to green pastures. Those sudden problems with performance in production are now the operations team's problem, to out provision or get a maintenance team on it.

Yeah, I know it's not that way in a company with good management and technical leadership... but I've been burnt enough to have a "get off my lawn" hat for this occasion.

> What is your definition of "roughly" and how do you think lack of that knowledge affects knowledge of optimization and scalability?

Just that you hash a value you want to store and transform that hash into an array index for fast lookup. If someone didn't know that I find it hard to believe they'd have much knowledge or interest about how to pick appropriate data structures for large collections as it's such a fundamental concept used in lots of places (e.g. indices, caching). I'm not even expecting knowledge of what happens when two keys hash to the same bucket.

> Also because I used to write large-scale real-time signal processing code that didn't use hash tables at all, so lack of knowledge there wouldn't have this effect on our code.

Is there a similar data structure question you would ask a candidate who wants to work in this domain? You don't use hashes at all?

> Just that you hash a value you want to store and transform that hash into an array index for fast lookup. If someone didn't know that I find it hard to believe they'd have much knowledge or interest about how to pick appropriate data structures for large collections as it's such a fundamental concept used in lots of places (e.g. indices, caching). I'm not even expecting knowledge of what happens when two keys hash to the same bucket.

Makes sense to me. I just wouldn't call that knowing how a hash table "works", which is why I asked for clarification.

> Is there a similar data structure question you would ask a candidate who wants to work in this domain?

I would focus on fixed-size data structures, since those were critical to our code. I would find out if they could go lower than Big-O (which was, quite frankly, useless to us) when thinking about runtime complexity and whether they could relate what the various operations were doing to what the processor was doing.

> You don't use hashes at all?

That code had no hashes whatsoever. We had no need to look things up in that manner.

To be fair, LinkedList is of very limited use. I can't think of a scenario I'd use it in beyond an LRU cache.
> To be fair, LinkedList is of very limited use. I can't think of a scenario I'd use it in beyond an LRU cache.

It's more the point that not knowing they exist is a very strong indicator you haven't read even introductory text on algorithms and data structures. I barely use them either but when I'm thinking of solutions to problems they're a useful concept to know.

Yeah, agreed. They are extremely simple.
A smartly designed doubly-linked list can be used for extremely fast deletion or insertion, as well as handling dynamic memory allocation for arbitrarily large data sets.

Linked lists are the foundation of operating systems, especially UNIX and AmigaOS.

Yeah, OK, that is another good example. But nevertheless, there are a lot of programs you could be asked to write where using a linked list doesn't make any sense.
If you're not familiar with intrusively linked lists, it's worth having a gander at. Learning and implementing a few of those gave me a lot more usecases for linked lists.
I have used a "light" linked list (a linked list that contains just three pointers) to efficiently sort arrays with very large records in C.