|
|
|
|
|
by matt4711
3275 days ago
|
|
There are much faster SA construction algorithms than skew (check out divsufsort). The O(n) algorithms using induced sorting are also likely much faster than this work. The constants of recent O(n) algorithms are very low. here a link for some state-of-the-art benchmarks of non-parallel SA construction algorithms: https://github.com/y-256/libdivsufsort/blob/wiki/SACA_Benchm... there also now exist linear speedup parallel SA construction algorithms. |
|
Python is not the fastest programming language in the world either. As I said, I wanted something simple (less than 20 lines and quite readable) that performs well against random data. It is NOT intended to be used in production. You would have to add some checks for border cases and things like that. Plus it uses a lot of memory (although linear) because of the set and the dict, which is a drawback that the original implementation (suffix_matrix_stanford) does not have.
I clearly took simplicity over efficiency, and the advantage of this particular algorithm is that you compute the whole suffix matrix that you can reuse for the LCP.
Thank you for divsufsort, it seems to be really fast! During my search, I found this algorithm that does really well against the others, including divufsort: https://arxiv.org/abs/1307.1417
Without thinking too much, I think it is quite easy to make the prefix doubling algorithm parallel if you have a good parallel sorting algorithm, but it would of course not be the best solution…