Hacker News new | ask | show | jobs
by crazygringo 1834 days ago
Are there any uses cases for Reed-Solomon at the application level?

My impression was always that error checking was implemented at the hardware level, or else at the OS/driver level.

But just curious if there are some applications I'm missing.

9 comments

> uses cases for Reed-Solomon at the application level

Hadoop has an RS implementation inside the filesystem (called "erasure coding"), instead of storing 3 copies of the same data, it can actually instead store ~1.5 copies as (6+3) or (10+4).

Previously, I've run into this tech in satellite internet gateways, but distributed filesystems is where I've gone through the math & probabilities of failure properly.

I work on perf & the extra network hops (with 3 replicas, you read 100% of data local, when you stripe it that doesn't work) and math for the error correction are hot spots when you are trying to keep all cores busy.

Linux RAID-6: RS(10,8) Google File System II (Colossus): RS(9,6) Quantcast File System: RS(9,6) Intel & Cloudera’ HDFS-EC: RS(9,6) Yahoo Cloud Object Store: RS(11,8) Backblaze’s online backup: RS(20,17) Facebook’s f4 BLOB storage system: RS(14,10) Baidu’s Atlas Cloud Storage: RS(12, 8) .....
Are the numbers the size in bits? If so, those are very small blocks. It seems like you would get more effective redundancy with larger blocks. Does Reed-Solomon scale so poorly that only tiny blocks are computationally feasible?
Generally blocks. So backblaze takes 17 blocks of input, and generates 20 blocks out output. Then recoveries work if 17 or more blocks are recovered.
I worked on a research project in undergrad that used Reed-Solomon -- we showed a series of images to users, who would later pick those images out of a set of un-trained images in order to authenticate or access some privileged data, sans encryption. the error-correcting allowed for some degree of misremembering or forgetfulness, as is normal in human image recognition tasks.

The professor later spun up a startup that has since been acquired by Dropbox, but I'm unaware what kind of product they're currently working on.

Parchive 2 was, I believe, an implementation of Reed-Solomon at the application level; mostly used (when it was introduced) to counter loss from missing chunks on binary newsgroups. I think RAR might have picked up a similar feature around that time too?

I believe parchive version 1 was simple XOR parity.

No, parchive v1 also used Reed-Solomon. The main difference between v1 and v2 was that v1 worked on the file level, but v2 divided the set of files into blocks.

Also, the parity matrix for v1 would be sometimes singular (non-invertible), which v2 tried to fix but it didn't quite work.

You may want to be able to run highly resilient databases even on non-error-detecting hardware. (Most consumer disks, and even a lot of enterprise disks, don't reliably detect/report disk corruption)

You might also be building a distributed system, where blocks are spread across multiple disks. In that case, an "error" you're trying to correct might be the loss of a disk or server in the distributed system; there is no one single disk or OS responsible for all the relevant blocks in that case.

But, both of these are assuming you're building some kind of data-store, which is not a typical user application. Writing robust data-stores is hard for many reasons, error correcting is just one of them.

i used a reed solomon library recently in a project that encodes and decodes binary data stored in a video channel. i added it to compensate for compression/noise introduced in the re-encoding process on third party sites like youtube. here's an example of what it looks like: [0]

[0] https://cicada.jollo.org/xg/sp.mp4

QR codes use Reed Solomon codes
So do pdf-417 barcodes. I remember writing the code for pdf417_decode and pdf417_encode (freeware) back in the day. The Blahut book on error correcting codes was a great help, but all the examples where using binary galois fields. The pdf-417 used a field of powers of 3 mode 929, so my brain had to figure out how that worked and put it into code.
If you need to transmit data over an unreliable link where retransmission isn't practical you will want some sort of FEC. If hardware isn't available to do the job then it gets done in software.