|
|
|
|
|
by StefanKarpinski
2428 days ago
|
|
A suffix array is of similar utility to a suffix tree, more compact to store, and pretty simple to compute: all you need to do is start with an array of each index into the data from start to end, then sort the indices by memcmp'ing the suffix string starting at each index. There are more optimized algorithms, but they're within 2-3x for moderately sized data (megabytes). You need the fancier sorting algorithms if you're constructing a prefix array of something huge like a genome. |
|
It won't matter for experimenting or most likely for using it on smallish realistic texts, but anything where you're taking user data for instance you'd want to step up to a slightly more clever algorithm (there's an O(n lg n) worst case one that's still very simple, called rank doubling I believe).