|
|
|
|
|
by jgbyrne
806 days ago
|
|
This is one of the most useful documents on the web regarding bzip2 implementation. I wrote a bzip2 encoder in Rust [1] a couple of years ago and it would have been an uphill struggle without Joe Tsai's work. Like Tsai did for his Go implementation, I used the 'SA-IS' algorithm for computing the Burrows-Wheeler Transform. Unlike the algorithm used in the reference implementation of bzip2, it is linear time, which in practice means it has much better upper-bound performance (though is typically somewhat slower on average). The problem of suffix array construction, which SA-IS solves (with computation of the BWT being a natural corollary), is a very interesting one in which theoretical and practical advances are still being made. There is a notable implementation of SA-IS by Ilya Grebnov which is spectacularly fast due to usage of techniques like cache prefetching [2]. It's worth a look for anyone interested in really high performance software compression. [1] https://github.com/jgbyrne/banzai/ [2] https://github.com/IlyaGrebnov/libsais |
|