| I have played with a C++ implementation of RFC6330, and while I have to say I never went for performance, I find this benchmark a bit... pointless. RaptorQ is kind of a gaussian elimination of a matrix, so it all depends on the block size (=>matrix size).
The algorithm has basically cubic complexity on the number of symbols in a block.
RFC6330 is made to work on files, which are divided into blocks with a certain number of symbols, and the bytes are interleaved. This implementation does not do the (complex and almost pointless) interleaving, which is fine, even OpenRQ does not. The bench seems to be done on a.... 10kb file?
It all fits in the L2. We are not given the symbol size (which determines the block size!) and I assume all of this fits in a 10x10 matrix. You are benchmarking operations on a matrix that is (more or less) a 10x10 byte matrix. The biggest part of this benchmark might almost be the generation of repair symbols (was it even done?), since that would require multiple xoring of the above-mentioned symbols. This is much closer to micro-benchmarking than an actual benchmark, imho. It would have been more interesting to see what happens with files at least larger than the L3 cache. You can also cache intermediate results, (which he does not do) which is especially useful for encoding, but only when working on matrix >= 100x100, otherwise just searching the cache, getting from memory (my implementation optionally did LZ4 compression/decompression) and doing a matrix multiplication is slower than just computing the matrix again. Still, it's nice to see implementations of the RFC, which is a real pain to read... |