Hacker News new | ask | show | jobs
by jgbyrne 806 days ago
This is one of the most useful documents on the web regarding bzip2 implementation. I wrote a bzip2 encoder in Rust [1] a couple of years ago and it would have been an uphill struggle without Joe Tsai's work.

Like Tsai did for his Go implementation, I used the 'SA-IS' algorithm for computing the Burrows-Wheeler Transform. Unlike the algorithm used in the reference implementation of bzip2, it is linear time, which in practice means it has much better upper-bound performance (though is typically somewhat slower on average).

The problem of suffix array construction, which SA-IS solves (with computation of the BWT being a natural corollary), is a very interesting one in which theoretical and practical advances are still being made. There is a notable implementation of SA-IS by Ilya Grebnov which is spectacularly fast due to usage of techniques like cache prefetching [2]. It's worth a look for anyone interested in really high performance software compression.

[1] https://github.com/jgbyrne/banzai/

[2] https://github.com/IlyaGrebnov/libsais