Hacker News new | ask | show | jobs
by kccqzy 519 days ago
By the way, LMDB's main developer Howard Chu responded to the paper. He said,

> They report on a single "vulnerability" in LMDB, in which LMDB depends on the atomicity of a single sector 106-byte write for its transaction commit semantics. Their claim is that not all storage devices may guarantee the atomicity of such a write. While I myself filed an ITS on this very topic a year ago, http://www.openldap.org/its/index.cgi/Incoming?id=7668 the reality is that all storage devices made in the past 20+ years actually do guarantee atomicity of single-sector writes. You would have to rewind back to 30 years at least, to find a HDD where this is not true.

So this is a case where the programmers of LMDB thought about the "incorrect" use and decided that it was a calculated risk to take because the incorrectness does not manifest on any recent hardware.

This is analogous to the case where someone complains some C code has undefined behavior, and the developer responds by saying they have manually checked the generated assembler to make sure the assembler is correct at the ISA level even though the C code is wrong at the abstract C machine level, and they commit to checking this in the future.

Furthermore both the LMDB issue and the Postgres issue are noted in the paper to be previously known. The paper author states that Postgres documents this issue. The paper mentions pg_control so I'm guessing it's referring to this known issue here: https://wiki.postgresql.org/wiki/Full_page_writes

> We rely on 512 byte blocks (historical sector size of spinning disks) to be power-loss atomic, when we overwrite the "control file" at checkpoints.

3 comments

This assumption was wrong for Intel Optane memory. Power loss could cut the data stream anywhere in the middle. (Note: the DIMM nonvolatile memory version)
consumer Optane were not "power loss protected", that is every different than not honoring a requested a synchronous write.

The crash-consistency problem is very different than the durability of real synchronous writes problem. There are some storage devices which will lie about synch writes, sometimes hoping that a backup battery will allow them to complete those write.

System crashes are inevitable, use things like write ahead logs depending on need etc... No storage API will get rid of all system crashes and yes even apple games the system by disabling real sync writes, so that will always be a battle.

You're missing the point. GP was mentioning the common assumption that all systems in the last 30 years are sector-atomic under power loss condition. Either the sector is fully written or fully not written. Optane was a rare counter example, where sector can become partially written, thus not sector-atomic.
It is not rare for flash storage devices to lose data on power loss, even data that is FLUSH'd. See https://news.ycombinator.com/item?id=38371307

There are known cases where power loss during a write can corrupt previously written data (data at rest). This is not some rare occurrence. This is why enterprise flash storage devices have power loss protection.

See also: https://serverfault.com/questions/923971/is-there-a-way-to-p...

I wish someone would sell an SSD that was at most a firmware update away between regular NVMe drive and ZNS NVMe drive. The latter just doesn't leave much room for the firmware to be clever and just swallow data.

Maybe also add a pSLC formatting mode for a namespace so one can be explicit about that capability...

It just has to be a drive that's useable as a generic gaming SSD so people can just buy it and have casual fun with it, like they did with Nvidia GTX GPUs and CUDA.

Unfortunately manifacturers almost always prefer price gouging on features that "CuStOmErS aRe NoT GoInG tO nEeD". Is it even a ZNS device available for someone who isn't a hyperscale datacenter operator nowadays?
Really? A 512-byte sector could get partially written? Did anyone actually observe this, or was it just a case of Intel CYA saying they didn't guarantee anything?
Yes, really. "Crash-consistent data structures were proposed by enforcing cacheline-level failure-atomicity" see references in: https://doi.org/10.1145/3492321.3519556
That reference appears to link to a DoI that doesn't actually exist.
This is called “Atomic Write Unit Power Failure” (AWUPF).
> the developer responds by saying they have manually checked the generated assembler to make sure the assembler is correct at the ISA level even though the C code is wrong at the abstract C machine level, and they commit to checking this in the future.

Yeah, sounds about right about quite a lot of the C programmers except for the "they commit to checking this in the future" part. I've responses like "well, don't upgrade your compiler; I'm gonna put 'Clang >= 9.0 is unsupported' in the README as a fix".