|
|
|
|
|
by ridiculous_fish
2421 days ago
|
|
In practice one should typically prefer a suffix array to a suffix tree. A suffix array is conceptually so easy! 1. Given a set of strings, put all the suffixes into a set. For example "derp" -> "derp", "erp", "rp", "p". 2. Sort it It's cheap! The number of suffixes equals the string length, and suffixes can be mere references into the string input. So in practice you use maybe ~2x memory compared to the input - and in big cache friendly chunks (not tries). Now you're flying. If you want to perform a substring search: 1. Binary search on the first character. You get a range of suffixes which begin with that character. 2. Within that range, perform a binary search on the second character. Now you have a range of suffixes which begin with the first two characters. 3. etc. Powerful and easy. By the way this technique of going character by character can and should be extended to the sort itself; look up "three-way radix quicksort" for what you never learned in school. |
|