| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by t0 4587 days ago
	Were they not using raid or performing multiple database writes? A mechanical hard drive failure is pretty common and can be mitigated fairly easily.

3 comments

dvanduzer 4587 days ago

RAID arrays fail all the time; the system has famously been one server, and the only visible recent scaling work has been front end caching.

edit: the code has been public for a long time, and there is not a database to replicate. the site ran as a single server for years, and it is unlikely the front end caching has changed anything about the "database" components.

Since RAID failures actually are somewhat common, they are probably looking at a higher level replicated storage system now, a la DRBD, or some kind of distributed file system, a la Gluster.

link

tdubhro1 4586 days ago

Deosn't RAID usually at least give some warning if you watch the syslogs? (Genuine question, I am not a sysadmin, we have linux servers with hetzner on software raid 1 and a couple have had single-disk issues which we spotted straight away in zenoss and had hetzner replace the disk. Am I incorrect in thinking this is normal?)

link

dsr_ 4586 days ago

RAID is a method for surviving hardware failure. If you have a software failure in, say, the VFS layer, RAID will happily accept the order to write garbage all over your inode trees and will carefully store and make sure that all the appropriate disks can return the same garbage every time. And yes, it should warn you when you need to replace a disk which is no longer returning the right garbage.

Similarly, if you rm -rf a vital directory tree, RAID can ensure that it goes away reliably.

link

andrewcooke 4586 days ago

yes you're right. so replies will now switch to how they don't stop you from deleting data, because... well, i have no idea why. it seems to just be a law of nature.

link

imbriaco 4587 days ago

DRBD and Gluster are not any more resilient to filesystem corruption than a RAID device is. In this kind of case you hope for either real-time replicated storage on a completely separate physical host or very recent backups.

link

dvanduzer 4587 days ago

What are DRBD and Gluster if not real-time replicated storage on completely separate physical hosts?

Filesystem corruption without hardware failure is far rarer in my experience. Have you seen an instance that wasn't a proverbial user error?

link

nilsbunger 4587 days ago

You never ran reiserfs I see...

Back in ~2004 I watched IT spend a whole day recovering our 60-person startup's main Linux NFS server, due to a software bug in the storage driver. Had to rebuild the whole system from backups.

link

imbriaco 4587 days ago

Yes, I have in fact, in a DRBD configuration. The bug was esoteric, but it happened and was not the result of user error. DRBD and Gluster both allow faults in the VFS layer to propagate to all replicas.

link

avifreedman 4587 days ago

Gluster should by design I think avoid replicating filesystem metadata corruption (but would replicate internal metadata issues in files on top of the filesystem) but DRBD won't... At high volumes I still regularly break Gluster but it'd probably be OK for lower bandwidth/ops use. Not sure what the HN disk usage pattern is though.

link

username223 4587 days ago

IIRC Glusrerfs was the thing that gave me multiple identically-named files in the same directory. Useless.

link

sitkack 4586 days ago

http://basho.com/riak/

link

cbsmith 4586 days ago

Or, I dunnoh... writing to S3? ;-)

link

venus 4587 days ago

Databases? What databases?

HN is persisted to flat files.

link

Sami_Lehtinen 4587 days ago

I guess he meant having two separate logs. One for production, and secondary with his journal. In this case you could restore from backup the original data, and then replay rest of stuff from the external log. That's the solution I'm using with really important data where I cannot afford any data loss, even if down time is acceptable. On commit, it committed to two separate systems, but the secondary system is only journal which can be replayed.

link