Hacker News new | ask | show | jobs
by layer8 1140 days ago
The point is that you don’t need a hash table implementation. And the normalization that is used as the sorting key doesn’t otherwise have good hash-like qualities (like constant size and a good distribution) and thus doesn’t especially deserve to be called a hash.
1 comments

How do you extract the last (largest) entry for each normalized key from the sorted list? What is the command line function?
It can be a simple ~20 line C program that checks whether the previous line has the same normalized key as the current line. It doesn't require hashing. I didn't say you could do it all with standard unix programs.
You pipe the sorted list into awk (for example) and append the second field to a list as long as the value of the first field remains the same. Whenever the value of the first field changes, and in the END block, you output the list (which contains the matching anagrams) and reset it to empty.

No hash table needed, just splitting the line into the two fields, equality comparison, and appending values to a list.