| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kccqzy 519 days ago

By the way, LMDB's main developer Howard Chu responded to the paper. He said,

> They report on a single "vulnerability" in LMDB, in which LMDB depends on the atomicity of a single sector 106-byte write for its transaction commit semantics. Their claim is that not all storage devices may guarantee the atomicity of such a write. While I myself filed an ITS on this very topic a year ago, http://www.openldap.org/its/index.cgi/Incoming?id=7668 the reality is that all storage devices made in the past 20+ years actually do guarantee atomicity of single-sector writes. You would have to rewind back to 30 years at least, to find a HDD where this is not true.

So this is a case where the programmers of LMDB thought about the "incorrect" use and decided that it was a calculated risk to take because the incorrectness does not manifest on any recent hardware.

This is analogous to the case where someone complains some C code has undefined behavior, and the developer responds by saying they have manually checked the generated assembler to make sure the assembler is correct at the ISA level even though the C code is wrong at the abstract C machine level, and they commit to checking this in the future.

Furthermore both the LMDB issue and the Postgres issue are noted in the paper to be previously known. The paper author states that Postgres documents this issue. The paper mentions pg_control so I'm guessing it's referring to this known issue here: https://wiki.postgresql.org/wiki/Full_page_writes

> We rely on 512 byte blocks (historical sector size of spinning disks) to be power-loss atomic, when we overwrite the "control file" at checkpoints.

3 comments

yuboyt 519 days ago

This assumption was wrong for Intel Optane memory. Power loss could cut the data stream anywhere in the middle. (Note: the DIMM nonvolatile memory version)

link

nyrikki 519 days ago

consumer Optane were not "power loss protected", that is every different than not honoring a requested a synchronous write.

The crash-consistency problem is very different than the durability of real synchronous writes problem. There are some storage devices which will lie about synch writes, sometimes hoping that a backup battery will allow them to complete those write.

System crashes are inevitable, use things like write ahead logs depending on need etc... No storage API will get rid of all system crashes and yes even apple games the system by disabling real sync writes, so that will always be a battle.

link

yuboyt 519 days ago

You're missing the point. GP was mentioning the common assumption that all systems in the last 30 years are sector-atomic under power loss condition. Either the sector is fully written or fully not written. Optane was a rare counter example, where sector can become partially written, thus not sector-atomic.

link

x1f604 519 days ago

It is not rare for flash storage devices to lose data on power loss, even data that is FLUSH'd. See https://news.ycombinator.com/item?id=38371307

There are known cases where power loss during a write can corrupt previously written data (data at rest). This is not some rare occurrence. This is why enterprise flash storage devices have power loss protection.

link

namibj 519 days ago

I wish someone would sell an SSD that was at most a firmware update away between regular NVMe drive and ZNS NVMe drive. The latter just doesn't leave much room for the firmware to be clever and just swallow data.

Maybe also add a pSLC formatting mode for a namespace so one can be explicit about that capability...

It just has to be a drive that's useable as a generic gaming SSD so people can just buy it and have casual fun with it, like they did with Nvidia GTX GPUs and CUDA.

link

tliltocatl 518 days ago

Unfortunately manifacturers almost always prefer price gouging on features that "CuStOmErS aRe NoT GoInG tO nEeD". Is it even a ZNS device available for someone who isn't a hyperscale datacenter operator nowadays?

link

lmm 518 days ago

Really? A 512-byte sector could get partially written? Did anyone actually observe this, or was it just a case of Intel CYA saying they didn't guarantee anything?

link

yuboyt 518 days ago

Yes, really. "Crash-consistent data structures were proposed by enforcing cacheline-level failure-atomicity" see references in: https://doi.org/10.1145/3492321.3519556

link

lmm 518 days ago

That reference appears to link to a DoI that doesn't actually exist.

link

senderista 518 days ago

This is called “Atomic Write Unit Power Failure” (AWUPF).

link

Joker_vD 518 days ago

> the developer responds by saying they have manually checked the generated assembler to make sure the assembler is correct at the ISA level even though the C code is wrong at the abstract C machine level, and they commit to checking this in the future.

Yeah, sounds about right about quite a lot of the C programmers except for the "they commit to checking this in the future" part. I've responses like "well, don't upgrade your compiler; I'm gonna put 'Clang >= 9.0 is unsupported' in the README as a fix".

link