|
|
|
|
|
by dekhn
1589 days ago
|
|
I had to chuckle at this article because it reminded me of some of the things I've had to do to clean up data. One time I had to write a special mapreduce that did a multiple-step-map to converted my (deeply nested) directory tree into roughly equally sized partitions (a serial directory listing would have taken too long, and the tree was really unbalanced to partition in one step), then did a second mapreduce to map-delete all the files and reduce the errors down to a report file for later cleanup. This meant we could delete a few hundred terabytes across millions of files in 24 hours, which was a victory. |
|