Hacker News new | ask | show | jobs
by gbletr42 838 days ago
Hello Hacker News! I'm been developing this piece of software for about a week now, to serve as a fast and easy to use replacement for par2cmdline and zfec. Now that it is in good and presentable state, I'm releasing it to the world to get users, feedback, testing on architectures that aren't x86[-64], etc. If you have any feedback, questions, or find any bugs/problems, do let me know.
2 comments

You should at least be benchmarking against par2cmdline-turbo instead (stock par2cmdline isn't exactly performance-oriented). Also, you need to list the parameters used as they significantly impact performance of PAR2.

Your benchmark also doesn't list the redundancy %, as well as how resilient it is against corruption.

One thing I note is that both ISA-L and zfec use GF8, whilst PAR2 uses GF16. The latter is around twice as slow to compute, but allows for significantly more blocks/shards.

Got it, I'll add par2cmdline-turbo (I didn't know it existed) to the list along with those key details. Likewise, I'll also go and describe the benefits and downsides of each tool in more detail. I'll get around to it when I release the next version soon that fixes some of the problems described in this thread.
Thanks for doing that.

> par2cmdline[-turbo] encode: par2 c -r25 test

That command is rather unfair to PAR2 - you should add `-b48 -n2 -u` to make the comparison fairer.

PAR2 ain't exactly fast, particularly compared to GF8 focused formats, but the numbers you initially gave seemed wildly off, so I suspected the comparison wasn't really fair.

Ideally you should also be including the version of each tool used.

I've updated the benchmarks to include those flags. I've also specified the versions of all software involved.

It seems par2 is significantly faster with those options set than without, as in by an order of magnitude, it seems par2 struggles greatly with the large number of blocks that it sets by default. Thank you for telling me.

Nice work - thanks for your efforts!

Yeah, the compute complexity for Reed Solomon is generally O(size_of_input * number_of_recovery_blocks)

If you don't specify the number of blocks, par2cmdline defaults to 2000, so at 25%, it's generating 500 parity blocks, which is obviously much slower than what you're generating with the other tools.

Having said that, PAR2 is generally aimed at use cases with hundreds/thousands of parity blocks, so it's going to be at a disadvantage if you're generating less than 10 - which your benchmark shows.

I'd recommend explaining even a tiny bit what erasure coding is. I had to look it up as I didn't know the term. It's really cool, explain it yourself, why you're excited about it!
Sure, erasure coding is a form of error correcting codes that can be applied to data such that you can lose some n number of codes before you can successfully reconstruct the input data. For example, take k input symbols, and put it into an erasure code algorithm to get k+n symbols, where any n of the output symbols can be lost before you fail to be able to reconstruct the data. Symbols in this case can be some number of bits/bytes.

This is a really important property in situations where there can be big giant bursts of errors, because you can still reconstruct the data regardless. IIRC, CDs/DVDs/BDs all use two concatenated Reed Solomon (a type of erasure coding) coded symbols that are then interleaved with each other, which provides the disk protection against things like accidental scratches.

Nice! Add that to the README ;)
Done! Thank you, for it never occurred to me someone might stumble upon my software without already knowing (a dumb lapse of mind, I know).