Hacker News new | ask | show | jobs
by axiak 3950 days ago
Nowadays they use Reed–Solomon coding to effectively distribute their data without copying it to 3 places.
1 comments

Do you have a source for that? Why would they start using it now?
Disclaimer: googler, but not working on storage.

link here: http://static.googleusercontent.com/external_content/untrust...

actually in colossus one can tune RS coding parameters per file, to get a tradeoff between performance/durablity.

RS coding uses less copies, but same level of safety (tradeoff is the recovery computation time.)

Also how would that help with geographic redundancy? Local recovery from errors does you no good if your datacenter gets wiped out in a flood.
I can't find the presentation anywhere, but at HBaseCon 2014 one of the lead developers of Bigtable stated that they went to RS. Even for their older databases.

EDIT: In this video https://vimeo.com/100153741, around the 23 minute mark