We copied the raw image file of the corrupted metadata filesystem (MDT in Lustre lingo) to the ramdisk.
Then we mounted it via loopback and copied the files to tarballs. The bit that was really slow on the spinning disk was reading the millions of files from the metadata FS.
The first copy to RAM was a sequential image copy, thus not bottlenecked on seeks despite spinning platters.
The second copy from RAM was a file copy with a lot of random I/O, but not bottlenecked on seeks because it was reading from RAM.
Bulk writes tend to be more efficient. They might have made temporary configuration changes to make that end faster, or not if they lacked the appetite for the extra risk.
Then we mounted it via loopback and copied the files to tarballs. The bit that was really slow on the spinning disk was reading the millions of files from the metadata FS.
The basic process of the file-level backup is documented here: https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfu...