Hacker News new | ask | show | jobs
by fendrak 4376 days ago
This is a great solution!

What would be the reason to do it this way if you don't already have the FSA?

If you just need to hash a small-ish set of known items and keep memory usage low, I'd think starting with the table would make more sense. Wouldn't the FSA approach require more memory than the table, since it has to maintain all the intermediate nodes?

1 comments

Wouldn't the FSA approach require more memory than the table, since it has to maintain all the intermediate nodes?

What do you mean by 'a table'? If you want a set without false positives/negatives or a perfect hash function, you have to represent the possible elements in some way. FSAs are compact when you store a set of sequences where there is a certain amount of redundancy in the form of subsequences that occur in multiple strings. In such cases, FSAs are far more compact that a flat table of strings.

Of course, in an actual compact implementation, you would not implement states and transitions using classes. E.g. Dictomaton stores states as an array of transitions (which are unboxed), which is far more compact (see the numbers on the Github page for some comparisons with comparable data structures).