Hacker News new | ask | show | jobs
by StefanKarpinski 2428 days ago
A suffix array is of similar utility to a suffix tree, more compact to store, and pretty simple to compute: all you need to do is start with an array of each index into the data from start to end, then sort the indices by memcmp'ing the suffix string starting at each index. There are more optimized algorithms, but they're within 2-3x for moderately sized data (megabytes). You need the fancier sorting algorithms if you're constructing a prefix array of something huge like a genome.
1 comments

Just as a side note, that algorithm for suffix array construction has really bad worst cases (consider a string of all "a"s).

It won't matter for experimenting or most likely for using it on smallish realistic texts, but anything where you're taking user data for instance you'd want to step up to a slightly more clever algorithm (there's an O(n lg n) worst case one that's still very simple, called rank doubling I believe).