Hacker News new | ask | show | jobs
by deelowe 1924 days ago
I'd like to see a graph of density versus iops over time. It definitely feels like the gap has been widening for quite some time just based on how long my ZFS arrays take to do a scrub.

To answer the OP's question, it seems to me that after around 12TB or so, it makes more sense to move away from implementations that require rebuilds such as raid 1, no raid, or jbod solutions.

1 comments

Random IOPS is and always will be stuck at 240 IOPS for a 7200 RPM drive.

7200 RPM / 60 == 120 rotations per second. A "half-rotation" to move the typical data on the disk to the head (half the data is within the first half-rotation, the other half of the data is within the 2nd half rotation).

If you want to reach the data faster, you need to physically rotate the disk faster: such as a 10,000 RPM drive, 15k, or 20k drive. To allow for faster rotations, you shrink the drive to 2.5" or even 1.8". Alas, SSDs have taken over this niche entirely, so we only really have 3.5" and 7200 RPM drives anymore.

And if you have a sufficient number of fairly sequential operations in parallel they start to look very random[1][2] to the storage system.

[1]: https://www.youtube.com/watch?v=yHgSU6iqrlE (presentation)

[2]: https://www.snia.org/sites/default/files/SDC/2019/presentati... (slides, page 41)

Having dual actuators (Seagate's Mach.2 branding) can increase IOPS by having 2 heads process the queue in parallel. That should bring a noticeable improvement, but it's true that it doesn't apply to sequential random (just like NCQ didn't by reordering the queue -- you need a queue).

Not sure if there will be consumer drives with this eventually or if the cost is too prohibitive.

Except we can already achieve that kind of IOPS increase: by simply using two hard drives in parallel (be it RAID0, or even RAID1 if your driver is willing to split the reads between hard drives).

A multi-actuator drive isn't really "one hard drive" anymore, its really just two hard drives ganged together. While more physically convenient, it doesn't seem to really offer the true 2x increase we're looking for.

Actuator#1 cannot give more IOPS over the data that Actuator#1 is assigned over. You only get more IOPS if you can split the work between the two actuators. Same problem as RAID0 or RAID1 multi-read hard drives (you gotta figure out a way to "split the work" to get RAID0 truly 2x the IOPS).

RAID0 can't give you a true 2x increase, because reads and writes are constrained to a particular device, and big reads tend to require both drives working together.

RAID1 can give you a 2x increase in reads, but suffers even more than RAID0 when it comes to writes.

Dual actuators, implemented in a straightforward way, can both access the entire drive surface which means they can give you a true 2x increase. Sometimes even better than 2x, because each arm can focus on one side of the disk. For read/write workloads it completely outclasses RAID.

> because reads and writes are constrained to a particular device

That constraint means nothing here. You can issue two parallel reads to two drives in RAID-0 just as easily in RAID-1. The only case where this doesn't work is where you're reading more than 2x the interleave size and you're issuing separate requests for each interleaved chunk. With command queuing, a smart storage system should even recognize the pattern and buffer to reduce the damage, but you'll still pay a cost in extra interrupts and request handling though so it's better to learn about scatter/gather lists.

> they can give you a true 2x increase

I already explained why this isn't actually the case, and have observed it not to be the case with multiple generations of dual-actuator drives. Stop presenting theories based on misconceptions of how disks and storage stacks work as though they were fact.

> You can issue two parallel reads to two drives in RAID-0 just as easily in RAID-1.

Under RAID 0, the odds are 50% that two independent reads are on the same drive. It's impossible to get a speed advantage in that case.

> I already explained why this isn't actually the case

You said they "improve parallelism, not media transfer rate or latency", and I'm arguing about parallelism. Plus large transfers can be rearranged into parallelism (fact, not theory).

And you said that they can face internal contention "elsewhere" but implied that could be fixed.

So that doesn't sound like what you said disagrees with what I said.

SSDs achieve their speed in part by combining multiple independent NAND channels under a single controller - each channel is more or less equivalent to an actuator. Their speed vary greatly based on workload parallelism, yet it's still very much one drive.