Hacker News new | ask | show | jobs
by _pastel 2798 days ago
I have a pet implementation of the frequency map that I'm overly fond of, for ascii strings:

  (1) keep an array of length 127 that you re-use and set to 0 between calls
  (2) for each character in the first string, increment the array at the character's index
  (3) for each character in the second string, decrement the array at the character's index
If you end up with all 0s, it's an anagram.
3 comments

Better to use a hash map instead of an array. When iterating the second string, as soon as you see a character without an existing value it's not an anagram. If you finish the second string, check that all values are zero.

This way you only need to check the character values you actually use, and not all 127. It also generalizes trivially to larger character sets.

An array is simply an optimized specialization of a hash map.
His approach is the same as yours. As soon as the algorithm sees a character from second string where the value for that char in the array is zero, the second string is not an anagram of first string and can return false immediately.
Doesn't that mean in the end you have to check 127 values for if they are 0?

Or 64, if you store numbers as 32-bit integers and compare them as 64-bit using a union type.

You could also keep a counter of the number of non-zero entries and update on zero/non-zero transitions.
Given the constraints of the problem, it's the sanest thing to do.