| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by davidmr 2757 days ago
	They’d have to pay me a lot of money to do it, that’s for sure. I’d love to see the disaster recovery plans. Every major Lustre site I’m aware of has had a data loss “incident” at some point in their history. It’s possible AWS has it all figured out with background backups and block device replication and whatnot, but I’m skeptical.

2 comments

pinewurst 2757 days ago

I have doubts based on my experience with Lustre and a certain understanding of AWS operations. I'm guessing they're going for the Iranian minefield clearing technique - get a mob of kids, hand them plastic keys to heaven (or RSUs), and march them through the field.

link

mbreese 2757 days ago

Given that they call the non-S3 linked version 'ephemeral', I'm not sure there is a plan. I think S3 is the plan.

link

pinewurst 2757 days ago

'Ephemeral' was/is the original Lustre design model. It was intended for high performance swap/scratch at Livermore with a short data lifespan - your higher priority bomb sim forces mine to roll out to disk and back in later, and that's it. Lustre, even today, isn't long term stable. The longer you leave data on it, the greater the probability of corruption - even silently.

link

mbreese 2757 days ago

I've seen Lustre backed with ZFS listed a few places. Is the idea here to help mitigate the possibility of corruption?

link

agapon 2754 days ago

LLNL is the core force behind ZoL and it's primarily them who use ZFS-backed Lustre.

link

pinewurst 2754 days ago

I think ZoL is LLNL’s attempt to make up for inflicting Lustre on us.

link