| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by toast0 1833 days ago
	This article is speaking of large scale multinode distributed systems. Hundreds of rack sized systems. In those systems, you often don't need explicit disk redundancy, because you have data redundancy across nodes with independent disks. This is a good insight, but you need to be sure the disks are independent.

1 comments

merb 1833 days ago

well most often hba's and raid controllers are another thing which increases latency and makes maintenances costs go up quite a bit (more stuff to update) and also it's another part that can break.

that's why it's not recommended when running ceph.

link

Aea 1833 days ago

I'm pretty sure discrete HBAs / Hardware RAID Controllers have effectively gone the way of the dodo. Software RAID (or ZFS) is the common, faster, cheaper, more reliable way of doing things.

link

amarshall 1833 days ago

Don’t lop HBAs and RAID controllers together. The former is just PCIe to SATA or SCSI or whatever (otherwise it is not just an HBA, but indeed a RAID controller). Such a thing is still useful and perhaps necessary for software RAID if there are insufficient ports on the motherboard.

link

toast0 1833 days ago

Hardware RAID doesn't seem to be going away quickly. Since they're almost all made by the same company, and they can usually be flashed to be dumb HBAs, it's not too bad, but it was pretty painful when using managed hosting and the menu options with lots of disks all have the raid controllers that are a pain to setup; and I'm not going to reflash their hardware (although I did end up doing some SSD firmware updates myself because firmware bugs were causing issues and their firmware upgrade scripts weren't working well and were tremendously slow)

link

seized 1833 days ago

ZFS needs HBAs. Those get your disks connected but otherwise get out of the way of ZFS.

But yes, hardware RAID controllers and ZFS don't go together.

link

karmakaze 1833 days ago

Hardware caching raid controllers do have the advantage if power is lost, the cache can still be written out without the CPU/software to do it. This let's you safely run without write-thru cache fsync. This was a common spec for provisioned bare-metal MySQL servers I'd worked with.

link

Godel_unicode 1833 days ago

The entire comment thread of this article is on-prem, low scale admins and high-scale cloud admins talking past each other.

You can build in redundancy at the component level, at the physical computer level, at the rack level, at the datacenter level, at the region level. Having all of them is almost certainly redundant and unnecessary at best.

link

amarshall 1833 days ago

Sometimes. Other times they may make things worse by lying to the filesystem (and thereby also the application) about writes being completed, which may confound higher-level consistency models.

link

wtallis 1833 days ago

It does seem to me that it's much easier to reason about the overall system's resiliency when the capacitor-protected caches are in the drives themselves (standard for server SSDs) and nothing between that and the OS lies about data consistency. And for solid state storage, you probably don't need those extra layers of caching to get good performance.

link

karmakaze 1832 days ago

Since my experience was from a number of years back, I tried searching for more recent reports: "mysql ssd fsync performance". The top recent one I found was for Digital Ocean[0] in 2020. It says "average of about 20ms which matches your 50/sec" and mentions battery back-up controllers which wasn't even in my search terms.

[0] https://www.digitalocean.com/community/questions/poor-fsync-...

link

Nextgrid 1833 days ago

I would be worried about my data behind held hostage by a black box proprietary RAID controller from a hostile manufacturer (unless you're paying them millions to build & design you a custom product, at which point you may have access to internal specs & a contact within their engineering team to help you).

I'd rather have ZFS or something equivalent in software. Software can be inspected, is (hopefully) battle-tested for years by many different companies with different workloads & requirements, and worst-case scenario, because it's software, you can freeze the situation in time by taking byte-level snapshots of the underlying drives as well as a copy of the software for later examination/reverse-engineering, something you can't do with a hardware black box where you're bound to the physical hardware and often have a single shot at a recovery attempt (as it may change the state of the black box).

Have you heard of the SSD failures about a decade ago where the SSD controller's firmware had a bug that bricked the drive past a certain lifetime? The data is technically still there, and would be recoverable if you could bypass the controller or fix its firmware, but unless you had a very good relationship with the manufacturer of the SSD to gain access to the internal tools and/or source code to allow you to tinker with the controller you were SOL.

link

karmakaze 1832 days ago

It was RAID-1, so there's no data manipulation going on, a simple mirror copy with double the read bandwidth.

link