| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ople 3775 days ago
	Author here: We considered that but as the access pattern was likely pretty much random, the performance would have been terrible. Due to the break we had nearly a 1000 clustered servers sitting idle so it was reasonably quick to do the ramdisk trick.

1 comments

icefo 3775 days ago

I'm sorry but I don't understand something. What did you put on that big ramdisk ? The metadata ?

link

ople 3774 days ago

We copied the raw image file of the corrupted metadata filesystem (MDT in Lustre lingo) to the ramdisk.

Then we mounted it via loopback and copied the files to tarballs. The bit that was really slow on the spinning disk was reading the millions of files from the metadata FS.

The basic process of the file-level backup is documented here: https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfu...

link

garthk 3774 days ago

For those still not quite getting it:

The first copy to RAM was a sequential image copy, thus not bottlenecked on seeks despite spinning platters.

The second copy from RAM was a file copy with a lot of random I/O, but not bottlenecked on seeks because it was reading from RAM.

Bulk writes tend to be more efficient. They might have made temporary configuration changes to make that end faster, or not if they lacked the appetite for the extra risk.

link