| HN Mirror

OP and author here :)

Python is not the fastest programming language in the world either. As I said, I wanted something simple (less than 20 lines and quite readable) that performs well against random data. It is NOT intended to be used in production. You would have to add some checks for border cases and things like that. Plus it uses a lot of memory (although linear) because of the set and the dict, which is a drawback that the original implementation (suffix_matrix_stanford) does not have.

I clearly took simplicity over efficiency, and the advantage of this particular algorithm is that you compute the whole suffix matrix that you can reuse for the LCP.

Thank you for divsufsort, it seems to be really fast! During my search, I found this algorithm that does really well against the others, including divufsort: https://arxiv.org/abs/1307.1417

Without thinking too much, I think it is quite easy to make the prefix doubling algorithm parallel if you have a good parallel sorting algorithm, but it would of course not be the best solution…