Hacker News new | ask | show | jobs
by johnthuss 1318 days ago
> I understands the usage of Hash / Map instead of searching arrays and many other small things that actually enhance the code performance

I would consider this an assumed skill for any developer with a college degree. It’s basically the point of the entire Data Structures class, which is a degree requirement.

4 comments

FWIW, actually, one thing I learned in practice that’s wasn’t highlighted in my Algorithms course: overhead (constant C) matters. You can feel good about yourself for choosing an algorithm that scales in O(lg n) time, but if your you ignore the cost of each operation (C) you might be slow.

For example:

1. When n is small, an array is almost always better. Arrays have very little overhead compared to even a hash map.

2. Algorithms with the same O() may still have significant differences at runtime and might be balanced differently between insert and search times. AVL trees take longer than Red Black trees to insert, but might be 1 level better in height. That means one less access. Useful for a routing table, for example.

So, in summary, if your looking at other people’s code and see lots of arrays don’t get too smug…n is usually small.

Also cache matters a lot. If n is small an array is always faster by a big margin.
a good example of this is Fibonacci heaps. on paper they're great, but they result in egregious pointer chasing, while radix heaps are less flexible but can be backed by a contiguous array.

weirdly, in all my recent algorithm work, only the big-O has mattered (or it's all just NP-hard or even EXPTIME.)

I had an extraordinarily painful conversation with someone who had done pretty well in our DS class but didn’t have a ton of practical experience.

Me: “why don’t you just use a hash table here? That array you’re iterating through each time has like 200,000 entries”

Him: “I can’t. I need to be able to get both the key and the value and hash tables don’t store the key”

Me: “…sigh, school has failed us again”

I had an extraordinarily painful conversation with experience programmers when I pointed out that the values that the hash key relyed upon must not change, for the life of the hash map.
> ...when I pointed out that the values that the hash key relyed upon must not change, for the life of the hash map.

Now I'm wondering about something like:

  MutableHashMap<MutableKey<T1>, Value<T2>>
You could probably have they key be a container object and allow changing its internal value, which would then communicate to the hash map that it belongs to, that it should do some internal changes:

  Entry<MutableKey<String>, Value<SomeEntity>> entry = mutableHashMap.getEntryByKey("myKey");
  entry.setKey("myNewKey");
That logic could take an internal hash map, remove the element by the old key and add it with the new key. So, to an outside user it would seem like you can change the keys.

The only problem would be that depending on the language you're using, you wouldn't change the objects directly, but would use those containers and it would essentially just be some boilerplate around a regular hash map. For example, Java has MutableInt and similar classes, which here would be hooked up to the MutableHashMap that they belong to.

Then again, I can't come up with many cases where something like that would be nice to have, since if you want to "change" a key, you can just do the following manually:

  var ourValue = myHashMap.get("myKey");
  myHashMap.remove("myKey");
  myHashMap.put("myNewKey", ourValue);
the nitpicker in me wants to say that the hash key can change (and it works if you rehash items whose key changed), but lookup by key will fail because the item will landed in the bucket of its original hash... but yes keys can't change if you want your values to be retrievable by key.
Is it nitpicking if we end up with fewer bugs as a result of the conversation? :)
that's the spirit!
> someone who had done pretty well in our DS

> Him: “I can’t. I need to be able to get both the key and the value and hash tables don’t store the key”

How could he do well in the data structures class? This is the definition of a hash map or hashed dictionary, so this is basic knowledge of data structures that is taught in this class and central to know to even have a chance of passing the exam.

No, technically he's right. On paper, a hash map creates an index in an array-like structure from the key, but does not necessarily store the key in a retrievable way. The "Hash" in hashmap comes from the fact that the key is somehow hashed (an often irreversible procedure) to determine the memory location to store the value.

In practice it's not the case, but very technically from a purely theoretical standpoint, I think he's right.

EDIT: untrue on any finite sized array, due to collisions. See below, and sorry for the brain fart!

There's no theoretical way for a hash map to work without storing the key - the reason for that is hash collisions in presence of which you do need to run equality comparison with stored keys otherwise you would overwrite data just because of hash clashes.
>> In practice it's not the case, but very technically from a purely theoretical standpoint, I think he's right.

> There's no theoretical way for a hash map to work without storing the key - the reason for that is hash collisions in presence of which you do need to run equality comparison with stored keys otherwise you would overwrite data just because of hash clashes.

There's a ton of material in a Data Structures course. I'm sure the point about storing the key was mentioned at some point, but it's not the focus when talking about different hashing strategies etc. For people with a bunch of experience (e.g. I'd been writing Perl for years before this, so being able to iterate across a hash and get (k,v) pairs was obvious) it's a detail that is already there in your brain, but if the DS course is your first exposure to a hash map, it's a detail that can easily get lost in the huge forest of other brand new things to learn.

> but if the DS course is your first exposure to a hash map, it's a detail that can easily get lost in the huge forest of other brand new things to learn.

My experience differs: It is the common situation that the theory is taught in the lecture, and then each week you have to do quite some exercises to implement all the newly learned data structures in code (lots of work to do each week :-) ). So, if you forgot such a "detail", the code to write for next week simply won't work - 0 points.

This way, it is rather unlikely that you will forget such a "detail" in the forest of new things to learn. This statement in particular holds for students who had done pretty well in DS as the one you are talking about.

Collision-free hashes exist. They are just not used on practice to build hash maps, but if you use a 128 bits hash with homogeneous distribution, handling collisions is irrelevant.

Besides, just because you are storing the keys, it doesn't mean it is viable to retrieve them. On the collision aware maps you see on CS, iterating through the keys is extremely inefficient.

The OP's description is clearly somebody that has learned the theory, but don't have any practical knowledge.

I never understood why a 'dictionary' (so called conveniently in python) is called a 'map' in c++ and yet, a 'hash' in ruby. On a side note, a map in ruby is an operation that actively maps one thing to another. So confusing. Finally reading what you just wrote makes the pieces of this puzzle to fall in place! Never had a data structures class.
According to https://stackoverflow.com/a/2884200 map and dictionary are just terms for the same data structure concept. On the other hand: there exist other backings for this data structure than hashing the key, such as binary trees (for example red-black trees or AVL trees).
The followup question is "Can you explain a strategy for resolving hash collisions?"
Depends on the use case, sometimes a list is more appropriate. On an old machine, Java 17 can iterate a list of 30 million strings looking for a match in around a half second. Just checked it in jshell. If the use case was that the map's keys could potentially change over the lifetime of the map or if the value's were not idempotent and if iterating 30 million items occurs fairly infrequently, then an ArrayList (in Java) might be the best choice.
Optimistically, you should definitely be able to assume this. In practice, though, I wouldn't. I've been in school with and interviewed many a college grad that had virtually no concept of how to effectively apply any of the data structures they "learned". It's definitely not a given
And yet there are plenty of developers who happily loop through an array a couple of times looking for an object with a specific key. In Javascript.

Knowing the theory doesn't mean people apply it in practice.