Hacker News new | ask | show | jobs
by briandw 57 days ago
The article is fine, but I wanted to call this out.

"Every database you have ever used reads and writes to the filesystem, exactly like your code does when it calls open()."

Technically not true. Applications like SQLite use mmap to map the file into a locally addressable memory space. This lets you skip the syscalls when reading and writing. The kernel can map that data in dynamically much faster than a userland process can.

Later in the article they go over the process of reading in the entire file into memory, again mmap is much better at this. Would have been nice to see that approach used.

4 comments

> The kernel can map that data in dynamically much faster than a userland process can.

Not necessarily. The kernel's mmap implementation has quite a strong bias towards certains kinds of access patterns; deviate from them and it can become slower than read(2).

We tried using mmap(2) for audio file i/o in Ardour, and it got notably less bandwidth than just using read(2).

I'm curious if you tried different madvise strategies and if any of them worked better than others?
I don't recall - it was several years ago. But glancing through what is left of the test program, it seems likely that we did not.
The article definitely oversimplifies the IO happening in a database.

That said, depending who you are talking to, they may not agree with you on "mmap is much better than this". Some people will say you should do what you need in the application logic instead of depending on APIs from the OS. (although not necessarily for the specific example here)

https://db.cs.cmu.edu/mmap-cidr2022/

SQLite in mmap mode (not the default) will use mmap for reading. It will still use write using `pwrite` so it can detect and recover from write failures.

See https://www.sqlite.org/mmap.html :

> The default mechanism by which SQLite accesses and updates database disk files is the xRead() and xWrite() methods of the sqlite3_io_methods VFS object. These methods are typically implemented as "read()" and "write()" system calls which cause the operating system to copy disk content between the kernel buffer cache and user space.

> Beginning with version 3.7.17 (2013-05-20), SQLite has the option of accessing disk content directly using memory-mapped I/O and the new xFetch() and xUnfetch() methods on sqlite3_io_methods.

The principal advantage to mmap in my mind is that the cache is shared among your SQLite connections.

The backing store being used by map() is still a file in a filesystem, so I would say their overall claim is technically true. It's the "exactly like your code does when it calls open()" part that oversimplifies a little (though, again, remains technically true -- it's just giving an example of a thing you can do with a file, not exhaustively listing all the things you can do with a file).