Hacker News new | ask | show | jobs
by xen0 98 days ago
I've never grokkeed suffix trees, but isn't possible for them to be O(n) in space (n total length of all strings)? Is there just an unacceptable constant factor overhead? I can imagine the pointer overhead being painful.
1 comments

Like the other poster said, the rabbit hole continues with suffix arrays (https://en.wikipedia.org/wiki/Suffix_array#Space_efficiency), then compressed suffix arrays (https://en.wikipedia.org/wiki/Compressed_suffix_array).

Also explained by the creator of this: https://www.abahgat.com/project/suffix-tree/

> the human genome can be encoded as a 3GB string constructed out of an alphabet of four characters

> As of 2019, a suffix tree indexing the human genome using state of the art algorithms can easily occupy tens of gigabytes.