Hacker News new | ask | show | jobs
by light_hue_1 1157 days ago
I don't think so.

You take the first n bytes of your input string. You hash them. You look them up in the dictionary. The lookup is O(1) but hashing your input is O(n).

But there's a simple way around this.

We have incremental hash functions. When you add a byte to the string and you use an incremental hash, you do a fixed amount of extra work to update the hash. Like that you only need to O(1) extra work to compute the new hash.

One trivial incremental hash is to break your input into fixed size chunks, hash each chunk, and xor the chunks together.

1 comments

That's the same as the trie solution.
It's not at all like the trie solution! Not in performance and certainly not in terms of complexity.

A trie is a pretty nasty datastructure memory wise. It involves chasing pointers all over the place. The constants on a trie are not going to be very good.

For an incremental hash, you hash your string, and then you fix your hash up as you go. Then you look up your hash in a hash table. Code-wise it's far simpler to implement. And performance wise it will be far faster!

The two solutions have nothing to do with one another. Except that they're both O(n).

What you say is true, but the rolling hash solution is susceptible to potential hash collisions while the trie is not.
Ok sure. So what?

Hash collisions don't matter unless they're pathological. Just choose an out of the box hash that works well.

The article is just wrong.

Yup, pre-processing with a rolling hash allows you to get away with truly O(1) comparison (after the initial O(w) fixed cost). I don't know if the stdlib of any programming language does this internally if you request a hash of a substring, but it's trivial to implement yourself anyhow.

If you're writing a smug blog post about your "great" interview question, you should at least not make such "obvious" mistakes, especially considering rolling hashes are a "common" interview question trick for string problems. And in practice this hash'd solution probably might end up performing better than the Trie (of course needs benchmarking, etc., but it's not unreasonable to think it might). In an alternate universe you could easily imagine author himself getting failed for overlooking such an "obvious" solution.