Hacker News new | ask | show | jobs
by gary_0 157 days ago
Not a new idea, but it's intriguing to think about an architecture that's just: CPU <-> caches <-> nonvolatile storage

What if you could take it for granted that mmap()ing a file has the exact same performance characteristics as malloc(), aside from the data not going away when you free the address space? What if arbitrary program memory could be given a filename and casually handed off to the OS to make persistent? A lot of basic software design assumptions are still based on the constraints of the spinning rust era...

2 comments

> A lot of basic software design assumptions are still based on the constraints of the spinning rust era...

fsync() is still slow, and you need that for real persistence. It's not just about spinning rust, there's very good reasons for wanting a different treatment of clearly ephemeral/scratchpad storage.

I don’t have much evidence but feel like fsync has overshot its significance. It’s really important in databases and that’s fine, but a lot of software can handle (very occasional) kernel / hardware crashes by e.g. restoring data from an external backup, and can benefit from the mmap guarantee that process level crashes won’t lose any data.
Isn't the problem with fsync() that you can't wait for specific IO operations? You're waiting for every single operation on the file descriptor. If anything happens between your write and the call to fsync(), you're going to get delayed by the additional operation that managed to get through.
Modern storage hardware has multiple IO queues with operations in flight at any given time, so I'm not sure to what extent this applies. It may be that only operations that are directly relevant to any given range of data need to be waited for.
Don't modern NVMe controllers lie to the OS when invoking fsync()?
You can get something like this from Linux today. (And mmap is actually how you request memory from the kernel in almost all cases.)

It's just that mmap is slower than using read/write, because the kernel knows less about your data access patterns and thus has to guess for how to populate caches etc.

Yes, I know mmap already sort of allows this (and has for well over a decade). To elaborate: when I want to, say, parse a megabytes-sized file, I don't muck about with mmap(), I just read() into a buffer; it's simple and it's fast enough even though I'm just wasting microseconds waiting for bytes on one fast chip to get copied into another slightly faster chip (and then copied into CPU cache). If I'm dealing with a larger amount of data, I'd be tempted to use database middleware to figure out all the platform-specific shuffling between RAM and disk (designed on the assumption that the disk is spinning rust, cough), but that pulls in yet another chunk of complexity.

Instead, imagine if I could just state in one line of system-agnostic code "give me a pointer to /home/user/abc" and it does the right thing--assuming there was some way around mmap's current set of caveats. Imagine if I could turn a memory buffer into a file in one line of code and it Just Worked. Imagine if the OS treated my M.2 SSD as just another chip on the bus instead of still having a good amount of code on the hot path that assumes I'm manually sending bytes to a mechanical drive.

This has me thinking, it could be a fun project to prototype a convenient file interface based on pointers as a C library. I imagine it's possible to get something close to what you want in terms of interface (not sure about performance).

I suspect in some cases it will be more convenient and in other cases it will be less convenient to use. The write interface isn't so bad for some use cases like appending to logs. It's also not bad for its ability to treat different types of objects (files, pipes, network) generically.

But if you want to do all manipulation through pointers it should be doable. To support appending to files you'd probably need some way to explicitly grow the file size and hand back a new pointer. Some relevant syscalls would probably be open, ftruncate, and mmap.

You can do something like that, but you still need to be able to tell your OS when you want to sync what writes to disk.
Yes the situation sounds similar to regular files to me. Iiuc mmap can implicitly sync to files, but it is only guaranteed after unmapped (akin to closing a file) or an explicit call to msync (akin to fflush).