Working with Files Is Hard (2019) | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Working with Files Is Hard (2019) (danluu.com)
	203 points by nathan_phoenix 519 days ago

12 comments

continuational 519 days ago

> Pillai et al., OSDI’14 looked at a bunch of software that writes to files, including things we'd hope write to files safely, like databases and version control systems: Leveldb, LMDB, GDBM, HSQLDB, Sqlite, PostgreSQL, Git, Mercurial, HDFS, Zookeeper. They then wrote a static analysis tool that can find incorrect usage of the file API, things like incorrectly assuming that operations that aren't atomic are actually atomic, incorrectly assuming that operations that can be re-ordered will execute in program order, etc.

> When they did this, they found that every single piece of software they tested except for SQLite in one particular mode had at least one bug. This isn't a knock on the developers of this software or the software -- the programmers who work on things like Leveldb, LBDM, etc., know more about filesystems than the vast majority programmers and the software has more rigorous tests than most software. But they still can't use files safely every time! A natural follow-up to this is the question: why the file API so hard to use that even experts make mistakes?

Retr0id 519 days ago

> why the file API so hard to use that even experts make mistakes?

I think the short answer is that the APIs are bad. The POSIX fs APIs and associated semantics are so deeply entrenched in the software ecosystem (both at the OS level, and at the application level) that it's hard to move away from them.

huntaub 519 days ago

I take a different view on this. IMO the tricks that existing file systems play to get more performance (specifically around ordering and atomicity) make it extra hard for developers to reason about. Obviously, you can't do anything about fsync dropping error codes, but some of these failure modes just aren't possible over file systems like NFS due to protocol semantics.

IgorPartola 519 days ago

Not only that, but the POSIX file API also assumes that NFS is a thing but NFS breaks half the important guarantees of a file system. I don’t know if it’s a baby and bath water situation, but NFS just seems like a whole bunch of problems. It’s like having eval in a programming language.

AutistiCoder 518 days ago

The whole software ecosystem is built on bubblegum, tape, and prayers.

huntaub 519 days ago

What aspects of NFS do you think break half of the important guarantees of a file system?

jcalvinowens 518 days ago

Well, at least O_APPEND, O_EXCL, O_SYNC, and flock() aren't guaranteed to work (although they can with recent versions as I understand it).

UID mapping causing read() to return -EACCES after open() succeeds breaks a lot of userland code.

rwmj 518 days ago

Lack of inotify support is one that has annoyed me in the past. It not only breaks some desktop software, but it also should be possible for NFS to support (after all, the server sees the changes and could notify clients).

huntaub 518 days ago

Thanks for this, it's helpful. Totally heard about O_APPEND and read() returning -EACCESS. The other ones, I agree, should be fixed in later versions of the Linux kernel/NFS client.

DiggyJohnson 518 days ago

Just ran into this one recently trying to replace Docker w/ Podman for a CICD runner. Before anyone protests we have very strong, abnormal requirements on my project preventing most saner architectures. It wasn’t the root cause but the failure behavior was weird due to the behavior you just described.

__loam 519 days ago

POSIX is also so old and essential that it's hard to imagine an alternative.

jcranmer 519 days ago

Not really, there's been lots of APIs that have improved on the POSIX model.

The kind of model I prefer is something based on atomicity. Most applications can get by with file-level atomicity--make whole file read/writes atomic with a copy-on-write model, and you can eliminate whole classes of filesystem bugs pretty quickly. (Note that something like writeFileAtomic is already a common primitive in many high-level filesystem APIs, and it's something that's already easily buildable with regular POSIX APIs). For cases like logging, you can extend the model slightly with atomic appends, where the only kind of write allowed is to atomically append a chunk of data to the file (so readers can only possibly either see no new data or the entire chunk of data at once).

I'm less knowledgeable about the way DBs interact with the filesystem, but there the solution is probably ditching the concept of the file stream entirely and just treating files as a sparse map of offsets to blocks, which can be atomically updated. (My understanding is that DBs basically do this already, except that "atomically updated" is difficult with the current APIs).

Joker_vD 519 days ago

> Most applications can get by with file-level atomicity--make whole file read/writes atomic with a copy-on-write model, and you can eliminate whole classes of filesystem bugs pretty quickly.

    int fd = open(".config", O_RDWR | O_CREAT | O_SYNC_ON_CLOSE, 0o666);

    // effects of calls to write(2)/etc. are invisible through any other file description
    // until the close(2) is called on all descriptors to this file description.

    close(fd);

So now you can watch for e.g. either IN_MODIFY or IN_CLOSE_WRITE (and you don't need to balance it with IN_OPEN), it doesn't matter, you'll never see partial updates... would be nice!

BobbyTables2 519 days ago

Surely this can’t always be true?

What happens when a lot of data is written and exceeds the dirty threshold?

kragen 519 days ago

It's not hard to design a less bug-prone API that would enable you to do everything the POSIX file API permits and admits equally-high-performance implementations. But making that new API a replacement for the POSIX API would require rewriting essentially all of the software that somebody cares about to use your new, better API instead of the POSIX API. This is probably only feasible in practice for small embedded systems with a fairly small universe of software.

josephg 519 days ago

You could do a phased transition, where both the legacy posix api and the new api are available. This has already happened with a lot of the old C standard library. Old, unsafe functions like strcpy were gradually replaced by safer alternatives like strncpy.

Database developers don’t want the complexity or poor performance of posix. It’s wild to me that we still don’t have any alternative to fsync in Linux that can act as a barrier without also flushing caches at the same time.

ryao 519 days ago

Writes in the POSIX API can be atomic depending on the underlying filesystem. For example, small writes on ZFS through the POSIX API are atomic since they either happen in their entirety or they do not (during power failure), although if the writes are big enough (spanning many records), they are split into separate transactions and partial writes are then possible:

https://github.com/openzfs/zfs/blob/34205715e1544d343f9a6414...

Writes on ZFS cease to be atomic around approximately 32MB in size if I read the code correctly.

timewizard 519 days ago

> make whole file read/writes atomic with a copy-on-write model,

I have many files that are several GB. Are you sure this is a good idea? What if my application only requires best effort?

> eliminate whole classes of filesystem bugs pretty quickly.

Block level deduplication is notoriously difficult.

> where the only kind of write allowed is to atomically append a chunk of data to the file

Which sounds good until you think about the complications involved in block oriented storage medium. You're stuck with RMW whether you think you're strictly appending or not.

josephg 519 days ago

It doesn’t have to be one or the other. Developers could decide by passing flags to open.

But even then, doing atomic writes of multi gigabyte files doesn’t sound that hard to implement efficiently. Just write to disk first and update the metadata atomically at the end. Or whenever you choose to as a programmer.

The downside is that, when overwriting, you’ll need enough free space to store both the old and new versions of your data. But I think that’s usually a good trade off.

It would allow all sorts of useful programs to be written easily - like an atomic mode for apt, where packages either get installed or not installed. But they can’t be half installed.

emmelaich 519 days ago

Some of the problems transcend POSIX. Someone I know maintains a non-relational db on IBM mainframes. When diving into a data issue, he was gob-smacked to find out that sync'd writes did not necessarily make it to the disk. They were cached in the drive memory and (I think) the disk controller memory. If all failed, data was lost.

mangamadaiyan 518 days ago

This is precisely why well-designed enterprise-grade storage systems disable the drive cache and rely upon some variant of striping to achieve good I/O performance.

hackit2 519 days ago

Just wait till he has to deal with raid controllers.

MisterTea 519 days ago

I use Plan 9 regularly and while its Unix heritage is there, it most certainly isn't Unix and completely does away with POSIX.

timewizard 519 days ago

> POSIX fs APIs and associated semantics

Well I think that's the actual problem. POSIX gives you an abstract interface but it essentially does not enforce any particular semantics on those interfaces.

dkarl 519 days ago

> why the file API so hard to use that even experts make mistakes?

Sounds like Worse Is Better™: operating systems that tried to present safer abstractions were at a disadvantage compared to operating systems that shipped whatever was easiest to implement.

(I'm not an expert in the history, just observing the surface similarity and hoping someone with more knowledge can substantiate it.)

ncruces 519 days ago

POSIX file locking is clearly modeled around whatever was simplest to implement, although it makes no sense at all.

tjalfi 519 days ago

Jeremy Allison tracked down why POSIX standardized this behavior[0].

The reason is historical and reflects a flaw in the POSIX standards process, in my opinion, one that hopefully won't be repeated in the future. I finally tracked down why this insane behavior was standardized by the POSIX committee by talking to long-time BSD hacker and POSIX standards committee member Kirk McKusick (he of the BSD daemon artwork). As he recalls, AT&T brought the current behavior to the standards committee as a proposal for byte-range locking, as this was how their current code implementation worked. The committee asked other ISVs if this was how locking should be done. The ISVs who cared about byte range locking were the large database vendors such as Oracle, Sybase and Informix (at the time). All of these companies did their own byte range locking within their own applications, none of them depended on or needed the underlying operating system to provide locking services for them. So their unanimous answer was "we don't care". In the absence of any strong negative feedback on a proposal, the committee added it "as-is", and took as the desired behavior the specifics of the first implementation, the brain-dead one from AT&T.

[0] https://www.samba.org/samba/news/articles/low_point/tale_two...

ncruces 519 days ago

The most egregious part of it for me is that if I open and close a file I might be canceling some other library's lock that I'm completely unaware of.

I resisted using them in my SQLite VFS, until I partially relented for WAL locks.

I wish more platforms embraced OFD locks. macOS has them, but hidden. illumos fakes them with BSD locks (which is worse, actually). The BSDs don't add them. So it's just Linux, and Windows with sane locking. In some ways Windows is actually better (supports timeouts).

trinix912 518 days ago

> Sounds like Worse Is Better™: operating systems that tried to present safer abstractions were at a disadvantage compared to operating systems that shipped whatever was easiest to implement.

What about the Windows API? Windows is a pretty successful OS with a less leaky FS abstraction. I know it's a totally different deal than POSIX (files can't be devices etc), the FS function calls require a seemingly absurd number of arguments, but it does seem safer and clearer what's going to happen.

thfuran 519 days ago

Why does that seem more likely than file system API simply not having been a major factor in the success of failure of OSes?

kccqzy 519 days ago

By the way, LMDB's main developer Howard Chu responded to the paper. He said,

> They report on a single "vulnerability" in LMDB, in which LMDB depends on the atomicity of a single sector 106-byte write for its transaction commit semantics. Their claim is that not all storage devices may guarantee the atomicity of such a write. While I myself filed an ITS on this very topic a year ago, http://www.openldap.org/its/index.cgi/Incoming?id=7668 the reality is that all storage devices made in the past 20+ years actually do guarantee atomicity of single-sector writes. You would have to rewind back to 30 years at least, to find a HDD where this is not true.

So this is a case where the programmers of LMDB thought about the "incorrect" use and decided that it was a calculated risk to take because the incorrectness does not manifest on any recent hardware.

This is analogous to the case where someone complains some C code has undefined behavior, and the developer responds by saying they have manually checked the generated assembler to make sure the assembler is correct at the ISA level even though the C code is wrong at the abstract C machine level, and they commit to checking this in the future.

Furthermore both the LMDB issue and the Postgres issue are noted in the paper to be previously known. The paper author states that Postgres documents this issue. The paper mentions pg_control so I'm guessing it's referring to this known issue here: https://wiki.postgresql.org/wiki/Full_page_writes

> We rely on 512 byte blocks (historical sector size of spinning disks) to be power-loss atomic, when we overwrite the "control file" at checkpoints.

yuboyt 519 days ago

This assumption was wrong for Intel Optane memory. Power loss could cut the data stream anywhere in the middle. (Note: the DIMM nonvolatile memory version)

nyrikki 519 days ago

consumer Optane were not "power loss protected", that is every different than not honoring a requested a synchronous write.

The crash-consistency problem is very different than the durability of real synchronous writes problem. There are some storage devices which will lie about synch writes, sometimes hoping that a backup battery will allow them to complete those write.

System crashes are inevitable, use things like write ahead logs depending on need etc... No storage API will get rid of all system crashes and yes even apple games the system by disabling real sync writes, so that will always be a battle.

yuboyt 519 days ago

You're missing the point. GP was mentioning the common assumption that all systems in the last 30 years are sector-atomic under power loss condition. Either the sector is fully written or fully not written. Optane was a rare counter example, where sector can become partially written, thus not sector-atomic.

x1f604 519 days ago

It is not rare for flash storage devices to lose data on power loss, even data that is FLUSH'd. See https://news.ycombinator.com/item?id=38371307

There are known cases where power loss during a write can corrupt previously written data (data at rest). This is not some rare occurrence. This is why enterprise flash storage devices have power loss protection.

See also: https://serverfault.com/questions/923971/is-there-a-way-to-p...

lmm 518 days ago

Really? A 512-byte sector could get partially written? Did anyone actually observe this, or was it just a case of Intel CYA saying they didn't guarantee anything?

yuboyt 518 days ago

Yes, really. "Crash-consistent data structures were proposed by enforcing cacheline-level failure-atomicity" see references in: https://doi.org/10.1145/3492321.3519556

lmm 518 days ago

That reference appears to link to a DoI that doesn't actually exist.

senderista 518 days ago

This is called “Atomic Write Unit Power Failure” (AWUPF).

Joker_vD 518 days ago

> the developer responds by saying they have manually checked the generated assembler to make sure the assembler is correct at the ISA level even though the C code is wrong at the abstract C machine level, and they commit to checking this in the future.

Yeah, sounds about right about quite a lot of the C programmers except for the "they commit to checking this in the future" part. I've responses like "well, don't upgrade your compiler; I'm gonna put 'Clang >= 9.0 is unsupported' in the README as a fix".

eviks 518 days ago

> why the file API so hard to use that even experts make mistakes?

Because it was poorly designed, and there is a high resistance to change, so those design mistakes from decades ago continue to bite

liontwist 519 days ago

Something this misses is that all programs make assumptions for example - “my process is the only one writing this file because it created it”

Evaluating correctness without that consideration is too high of a bar.

Safety and correctness cannot be “impossible to misuse”

nickelpro 518 days ago

And yet all of these systems basically work for day-to-day operations, and fail only under obscure error conditions.

It is totally acceptable for applications to say "I do not support X conditions". Swap out the file half way through a read? Sorry don't support that. Remove power to the storage devise in the middle of a sync operation? Sorry don't support that.

For vital applications, for example databases, this is a known problem and risks of the API are accounted for. Other applications don't have nearly that level of risk associated with them. My music tagging app doesn't need to be resistant to the SSD being struck by lightning.

It is perfectly acceptable to design APIs for 95% of use cases and leave extremely difficult leaks to be solved by the small number of practitioners that really need to solve those leaks.

belter 518 days ago

"PostgreSQL vs. fsync - How is it possible that PostgreSQL used fsync incorrectly for 20 years" - https://youtu.be/1VWIGBQLtxo

praptak 519 days ago

Ext4 actually special-handles the rename trick so that it works even if it should not:

"If auto_da_alloc is enabled, ext4 will detect the replace-via-rename and replace-via-truncate patterns and [basically save your ass]"[0]

[0]https://docs.kernel.org/admin-guide/ext4.html

Retr0id 519 days ago

> they found that every single piece of software they tested except for SQLite in one particular mode had at least one bug.

This is why whenever I need to persist any kind of state to disk, SQLite is the first tool I reach for. Filesystem APIs are scary, but SQLite is well-behaved.

Of course, it doesn't always make sense to do that, like the dropbox use case.

nodamage 519 days ago

Before becoming too overconfident in SQLite note that Rebello et al. (https://ramalagappan.github.io/pdfs/papers/cuttlefs.pdf) tested SQLite (along with Redis, LMDB, LevelDB, and PostgreSQL) using a proxy file system to simulate fsync errors and found that none of them handled all failure conditions safely.

In practice I believe I've seen SQLite databases corrupted due to what I suspect are two main causes:

1. The device powering off during the middle of a write, and

2. The device running out of space during the middle of a write.

justin66 519 days ago

I remembered Howard Chu commenting on that paper...

https://lists.openldap.org/hyperkitty/list/openldap-devel@op...

I'm pretty sure that's not where I originally saw his comments. I remember his criticisms being a little more pointed. Although I guess "This is a bunch of academic speculation, with a total absence of real world modeling to validate the failure scenarios they presented" is pretty pointed.

ablob 519 days ago

I believe it is impossible to prevent dataloss if the device powers off during a write. The point about corruption still stands and appears to be used correctly from what I skimmed in the paper. Nice reference.

lmm 518 days ago

> I believe it is impossible to prevent dataloss if the device powers off during a write.

Most devices write sectors atomically, and so you can build a system on top of that that does not lose committed data. (Of course if the device powers off during a write then you can lose the uncommitted data you were trying to write, but the point is you don't ever have corruption, you get either the data that was there before the write attempt or the data that is there after).

SoftTalker 519 days ago

Only way I know of is if you have e.g. a RAID controller with a battery-backed write cache. Even that may not be 100% reliable but it's the closest I know of. Of course that's not a software solution at all.

count 519 days ago

That's uh, not running out of power in the middle of the write. That's having extra special backup power to finish the write. If your battery dies mid cache-write-out, you're still screwed.

Dylan16807 518 days ago

I file that under hardware failure, not mundane power loss.

wmf 519 days ago

If the file system uses strict COW it should survive that situation.

ziddoap 519 days ago

>SQLite is the first tool I reach for.

Hopefully in whichever particular mode is referenced!

Retr0id 519 days ago

WAL mode, yes!

eatonphil 518 days ago

Do you turn on SQLite checksumming or how do you feel comfortable that data on disk stays keeps integrity?

edgarvaldes 519 days ago

As per HN headlines, files are hard, git is hard, regex is hard, time zones are hard, money as data type is hard, hiring is hard, people is hard.

I wonder what is easy.

paulddraper 518 days ago

Complaining :)

D-Coder 518 days ago

Selection error. The stuff that always works doesn't get posted here.

ssivark 518 days ago

To reuse another HN headline, all this is probably because no one really cares X-)

gavinhoward 519 days ago

I wonder if, in the Pillai paper, I wonder if they tested the SQLite Rollback option with the default synchronous [1] (`NORMAL`, I believe) or with `EXTRA`. I'm thinking that it was probably the default.

I kinda think, and I could be wrong, that SQLite rollback would not have any vulnerabilities with `synchronous=EXTRA` (and `fullfsync=F_FULLFSYNC` on macOS [2]).

[1]: https://www.sqlite.org/pragma.html#pragma_synchronous

[2]: https://www.sqlite.org/pragma.html#pragma_fullfsync

wruza 519 days ago

No mention on ntfs and windows keywords in the article, for those interested.

pjdesno 519 days ago

Although the conference this was presented at is platform-agnostic, the author is an expert on Linux, and the motivation for the talk is Linux-specific. (Dropbox dropping support for non-ext4 file systems)

The post supports its points with extensive references to prior research - research which hasn't been done in the Microsoft environment. For various reasons (NDAs, etc.) it's likely that no such research will ever be published, either. Basically it's impossible to write a post this detailed about safety issues in Microsoft file systems unless you work there. If you did, it would still take you a year or two of full-time work to do the background stuff, and when you finished, marketing and/or legal wouldn't let you actually tell anyone about it.

wmf 519 days ago

Universities can get Windows source code under NDA and do research on it but nobody really cares about such work.

pjdesno 518 days ago

"Getting windows source code under NDA" doesn't necessarily mean "can do research on it".

If you can't publish it, it's not research. If the source code is under NDA, then Microsoft gets the final say about whether you can publish or not, and if the result is embarrassing to Microsoft, I'm guessing it's "or not".

yahayahya 519 days ago

Is that because the windows APIs are better? Or because businesses build their embedded systems/servers with Windows?

p_ing 519 days ago

Certainly depends on which APIs you ultimately use as a developer, right? If it is .NET, they're super simple, and you can get IOCP for "free" and non-blocking async I/O is quite easy to implement.

I can't say the Win32 File API is "pretty", but it's also an abstraction, like the .NET File Class is. And if you touch the NT API, you're naughty.

On Linux and macOS you use the same API, just the backends are different if you want async (epoll [blocking async] on Linux, kqueue on macOS).

pjc50 518 days ago

The windows APIs are certainly slower. Apart from IOCP I don't think they're that much different? Oh, and mandatory locking on executable images which are loaded, which has .. advantages and disadvantages (it's why Windows keeps demanding restarts)

wruza 519 days ago

I doubt that, was just curious how it might compare in the article.

ryao 519 days ago

> On Linux ZFS, it appears that there's a code path designed to do the right thing, but CPU usage spikes and the system may hang or become unusable.

ZFS fsync will not fail, although it could end up waiting forever when a pool faults due to hardware failures:

https://papers.freebsd.org/2024/asiabsdcon/norris_openzfs-fs...

ein0p 519 days ago

ZFS on Linux unfortunately has a long standing bug which makes it unusable under load: https://github.com/openzfs/zfs/issues/9130. 5.5 years old, nobody knows the root cause. Symptoms: under load (such as what one or two large concurrent rsyncs may generate over a fast network - that's how I encountered it) the pool begins to crap out and shows integrity errors and in some cases loses data (for some users - it never lost data for me). So if you do any high rate copies you _must_ hash-compare source and destination. This needs to be done after all the writes are completed to the zpool, because concurrent high rate reads seem to exacerbate the issue. Once the data is at rest, things seem to be fine. Low levels of load are also fine.

ryao 517 days ago

There are actually several distinct issues being reported there. I replied responding to everyone who posted backtraces and a few who did not:

https://github.com/openzfs/zfs/issues/9130#issuecomment-2614...

That said, there are many others who stress ZFS on a regular basis and ZFS handles the stress fine. I do not doubt that there are bugs in the code, but I feel like there are other things at play in that report. Messages saying that the txg_sync thread has hung for 120 seconds typically indicate that disk IO is running slowly due to reasons external to ZFS (and sometimes, reasons internal to ZFS, such as data deduplication).

I will try to help everyone in that issue. Thanks for bringing that to my attention. I have been less active over the past few years, so I was not aware of that mega issue.

ein0p 517 days ago

Regarding your comment - seems unlikely that it "affects Ubuntu less". I don't see why that would be the case - it's not like Ubuntu runs a heavily customized kernel or anything. And thanks for taking a look - ZFS is just the way things should be in filesystems and logical volume management, I do wish I could stop doing hash compares after large, high throughput copies and just trust it to do what it was designed to do.

ryao 517 days ago

Ubuntu kernels might have a different default IO elevator than proxmox kernels. If the issue is in the IO elevator (e.g. it is reordering in such a way that some IOs are delayed indefinitely before being sent to the underlying device) and the two use different IO elevators by default, then it would make sense why Ubuntu is not affected and proxmox is. There is some evidence for this in the comments as people suggest that the issue is lessened by switching to mq-deadline. That is why one of my questions asks what Linux IO elevator people’s disks are using.

The correct IO elevator to use for disks given to ZFS is none/noop as ZFS has its own IO elevator. ZFS will set the Linux IO elevator to that automatically on disks where it controls the partitioning. However, when the partitioning was done externally from ZFS, the default Linux elevator is used underneath ZFS, and that is never none/noop in practice since other Linux filesystems benefit from other elevators. If proxmox is doing partitioning itself, then it is almost certainly using the wrong IO elevator with ZFS, unless it sets the elevator to noop when ZFS is using the device. That ordinarily should not cause such severe problems, but it is within the realm of possibility that the Linux IO elevator being set by proxmox has a bug.

I suspect there are multiple disparate issues causing the txg_sync thread to hang for people, rather than just one issue. Historically, things that cause the txg_sync thread to hang are external to ZFS (with the notable exception of data deduplication), so it is quite likely that the issues are external here too. I will watch the thread and see what feedback I get from people who are having the txg_sync thread hang.

ein0p 517 days ago

Thanks a lot for elaborating. I'm traveling at the moment, but I'm going to try reproducing this issue once I'm back in town. IIRC I did do partitioning myself, using GPT partition table and default partition settings in fdisk.

Upd mq-deadline for all drives seems to be `none` for me. OS is Ubuntu 22.04

einpoklum 519 days ago

The article wrap up with this salient point:

> In conclusion, computers don't work (but I guess you already know this...

paulddraper 518 days ago

They work.

Just not all the time.

1vuio0pswjnm7 519 days ago

No Javascript or SNI:

https://archive.wikiwix.com/cache/index2.php?rev_t=&url=http...

AutistiCoder 518 days ago

it's a good thing I'm a Web developer.

closest I come to working with files is localStorage, but that's thread safe.

jheriko 518 days ago

this whole thing is a story about using outdated stuff in a shitty ecosystem.

its not a real problem for most modern developers.

pwrite? wtf?

not one mention of fopen.

granted some of the fine detail discussion is interesting, but it doesn't make practical sense since about 1990.

rep_lodsb 518 days ago

The article is about the hardware and kernel level APIs used for interacting with storage. Everything else is by necessity built on top of that interface.

"fopen"? That is outdated stuff from a shitty ecosystem, and how do you think it's implemented?

userbinator 519 days ago

I don't get it. The only times I've had problems with filesystem corruption in the past few decades was with a hardware problem, and said hardware was quickly replaced. FAT family has been perfectly fine while I've encountered corruption on every other FS including NTFS, exFAT, and the ext* family.

Meanwhile you can read plenty of stories of others having the exact opposite experience.

If you keep losing data to power losses or crashes, perhaps fix the cause of that? It doesn't make sense to try to work around it.

mystified5016 519 days ago

> If you keep losing data to power losses or crashes, perhaps fix the cause of that? It doesn't make sense to try to work around it.

Ponder this notion for a moment: there are problems within one's control and problems outside of one's control.

For example, we can't control the weather. If it snows three feet overnight you simply have to deal with the fact that you're not getting to work today.

Since we can't simply stop hardware from failing, we have to deal with the fact that hardware fails. Your seventeen redundant UPSes might experience a one in a trillion cascade failure. It might take the utility ten minutes longer to restore your power than you have onsite generation.

This is not a class of problem we can control or prevent. We fix these problems by building systems which withstand failures. You can't just will electrons out of the wall socket, but you can build a better disk or FS that corrupts less data when the electrons stop.

PaulHoule 519 days ago

There was that time (2009 or so?) I wrote 2 million files to a single directory on NTFS and that filesystem was never the same again. It didn't seem to be a hardware problem. I used to be really careful to not put a crazy number of files in a directory on Linux and Windows storing them in subdirs like

  b7/b74a/b74a56

where the digits are derived from a hash of the file name but lately I've had some NTFS volumes with a 1M file directory that seem to be OK.

Hardware problems also manifest in mysterious ways. On both Windows and MacOS I had computers that seemed to be OK until I did an OS update which caused enough IO that a failing HDD was pushed over the edge and the update failed; in one case I was able to roll back the update but not apply the update, in another case the machine was trashed. Careful investigation (like taking the disk out and inspecting it on another computer) revealed a hard drive error although there was no clear indication of this in the UI and the average person would blame to software update

nodamage 519 days ago

> If you keep losing data to power losses or crashes, perhaps fix the cause of that?

I keep telling my users to make sure to plug their phones in before the battery dies, but for some reason they keep forgetting...

dooglius 518 days ago

Phones shut down when close, but before they hit zero battery

userbinator 519 days ago

Then that's entirely their fault. They deserve all the corruption they get.

userbinator 519 days ago

Seems like I hit a nerve. Apparently teaching users responsibility is a bad thing?

No wonder things are "hard". Because otherwise many in this godforsaken industry wouldn't need to be employed.