Hacker News new | ask | show | jobs
by wongarsu 3555 days ago
>how do you think lack of that knowledge affects knowledge of optimization and scalability?

In any program that has to handle just tenthousands of items ocasionally, knowing the difference between arrays, liked lists and pointer lists, or between various forms of hash tables, various forms of trees and sorted lists makes a big difference. This kind of knowledge affects how any program scales with input size and should be one of the first things checked when optimizing.

And I'd argue that just memorizing big-O properties only gets you half the way: heapsort is theoretically superiour to quicksort (better worst case, more space efficient, same average case), yet quicksort is used far more often because it makes better use of cpu caches and has a worst case that barely ever happens.

> Also because I used to write large-scale real-time signal processing code that didn't use hash tables at all, so lack of knowledge there wouldn't have this effect on our code.

Is the fact that it doesn't use hash tables a coincidence or is it because the people designing and writing the system know their data structures and realized that hash tables don't fit the problem?

2 comments

> In any program that has to handle just tenthousands of items ocasionally, knowing the difference between arrays, liked lists and pointer lists, or between various forms of hash tables, various forms of trees and sorted lists makes a big difference.

This is why I asked for clarification. To me "knowing the difference between" is different from knowing how they "work". Maybe it is just semantics in my head.

Edit because I forgot to address the second part:

> Is the fact that it doesn't use hash tables a coincidence or is it because the people designing and writing the system know their data structures and realized that hash tables don't fit the problem?

I suppose it is technically the latter, but it was glaringly obvious that hash tables weren't appropriate anywhere because none of our algorithms required associative lookups. Every algorithm had to touch every point of data it stored on every iteration anyway, so it was pointless to do anything but loop over it. Even if there had been a case where hash tables would have made sense, I'm not sure implementing a fixed-size hash table in C would have been worth the effort.

> In any program that has to handle just tenthousands of items ocasionally, knowing the difference between arrays, liked lists and pointer lists, or between various forms of hash tables, various forms of trees and sorted lists makes a big difference. This kind of knowledge affects how any program scales with input size and should be one of the first things checked when optimizing.

Exactly. If a developer is picking data structures without any consideration of their growth complexity they're going to cause serious issues.

Putting on my "get off my lawn" hat here:

> they're going to cause serious issues

But not until it's gone to production, and everyone's gotten an "attaboy" for completing their project and moved on to green pastures. Those sudden problems with performance in production are now the operations team's problem, to out provision or get a maintenance team on it.

Yeah, I know it's not that way in a company with good management and technical leadership... but I've been burnt enough to have a "get off my lawn" hat for this occasion.