Hacker News new | ask | show | jobs
by dkersten 3554 days ago
> how do you think lack of that knowledge affects knowledge of optimization and scalability?

I once wrote a log analysis python script. It took 2.5 hours to execute to completion. I then used Cython to compile it to C. It then took about 2 hours to run to completion.

I then stepped back and thought about the data structures I was using. I replaced some lists that were being searched with dictionaries that could were keyed by what I was searching my lists for. Replaced a few other simple data structures.

Now the script took, IIRC, 20 seconds to run to completion.

Now, I don't think you necessarily need to know all of the internal details of how a data structure is implemented to know its performance characteristics or when to use (or not use) them. However, the point is that by understanding their performance characteristics, code can be optimized or made more scalable.

Also, while something like a hash table might not be faster in actual real life terms (ie worse O() complexity can still be better wall clock; eg searching an array linearly can be faster than accessing a hash table since the array can be in cache and prefetched) when run on a single machine, big-O complexity can start mattering much more when you have a distributed workload or TB worth of data. (Although if you have a distributed workload, you have distributed systems problems to deal with too).

Having said all of this, without going into specific implementation details, hash tables and linked lists are incredibly simple and I see no reason why anyone calling themselves a programmer would have trouble with these. I mean, I wouldn't expect you to be able to write a good hash function or anything, but explaining the basic concept shouldn't be too hard for most programmers. If someone is self-taught, then ok maybe they haven't ever needed to learn this. Its definitely possible, but I would at least hope they would know basic performance characteristics or at least some rules of thumb as to when to use what data structure. Hopefully based on logic and not just hearsay ;-)

2 comments

> Having said all of this, without going into specific implementation details, hash tables and linked lists are incredibly simple and I see no reason why anyone calling themselves a programmer would have trouble with these. I mean, I wouldn't expect you to be able to write a good hash function or anything, but explaining the basic concept shouldn't be too hard for most programmers. If someone is self-taught, then ok maybe they haven't ever needed to learn this.

Even if you're self-taught, if you've been programming for a while you should be able to read up on what linked lists and hash tables are in 15 minutes and virtually all interview preparation guides mention data structures. If you've not had the curiosity to do that then that's a bad sign.

> If you've not had the curiosity to do that then that's a bad sign.

Or you're just working from the backs of giants using built-in types. It's easy to do, when you haven't had to dig into low level code before.

I won't disagree with your premise, but it took some pretty extraordinary circumstances for me to find value in taking the time to learn about the ins and outs of linked lists and hash tables, because my every day work never required it.

It was reading an article about cuckoo hashes which finally prompted me to go down that road, and I can't regret it. That said, I've still never had to implement my own hash table for work.

> It was reading an article about cuckoo hashes which finally prompted me to go down that road, and I can't regret it. That said, I've still never had to implement my own hash table for work

That's what I'm getting at, people that have the curiosity to find out more. Nobody should be implementing their own hash table but if you don't know about basic data structure and algorithm concepts you're not going to be sharp at coming up with new efficient solutions.

> Having said all of this, without going into specific implementation details, hash tables and linked lists are incredibly simple and I see no reason why anyone calling themselves a programmer would have trouble with these. I mean, I wouldn't expect you to be able to write a good hash function or anything, but explaining the basic concept shouldn't be too hard for most programmers. If someone is self-taught, then ok maybe they haven't ever needed to learn this. Its definitely possible, but I would at least hope they would know basic performance characteristics or at least some rules of thumb as to when to use what data structure. Hopefully based on logic and not just hearsay ;-)

That's why I was trying to establish what "roughly" means. This sounds like a reasonable definition.