Hacker News new | ask | show | jobs
by nextaccountic 164 days ago
> Consult the classic article on why sophisticated systems like DBMSs do not use mmap: https://db.cs.cmu.edu/mmap-cidr2022/

Sqlite does (or can optionally use mmap). How come?

Is sqlite with mmap less reliable or anything?

2 comments

If an I/O error happens with read()/write(), you get back an error code, which SQLite can deal with and pass back up to the application, perhaps accompanied by a reasonable error message. But if you get an I/O error with mmap, you get a signal. SQLite itself ought not be setting signal handlers, as that is the domain of the application and SQLite is just a lowly library. And even if SQLite could set signal handlers, it would be difficult to associate a signal with a particular I/O operation. So there isn't a good way to deal with I/O errors when using mmap(). With mmap(), you just have to assume that the filesystem/mass-storage works flawlessly and never runs out of space.

SQLite can use mmap(). That is a tested and supported capability. But we don't advocate it because of the inability to precisely identify I/O errors and report them back up into the application.

Thanks for the response. I am more worried about losing already committed data due to an error

https://www.sqlite.org/mmap.html

> The operating system must have a unified buffer cache in order for the memory-mapped I/O extension to work correctly, especially in situations where two processes are accessing the same database file and one process is using memory-mapped I/O while the other is not. Not all operating systems have a unified buffer cache. In some operating systems that claim to have a unified buffer cache, the implementation is buggy and can lead to corrupt databases.

What are those OSes with buggy unified buffer caches? More importantly, is there a list of platforms where the use of mmap in sqlite can lead to data loss?

I know that the spirit of HN will strike me down for this, but sqlite is not a "sophisticated system". It assumes the hardware is lawful neutral. Real hardware is chaotic. Sqlite has a good reputation because it is very easy to use. In fact this is the same reason programmers like mmap: it is a hell of a shortcut.
I think the main thing is whether mmap will make sqlite lose data or otherwise corrupt already committed data

... it will if two programs open the same sqlite, one with mmap, and another without https://www.sqlite.org/mmap.html - at least "in some operating systems" (no mention of which ones)

https://www.sqlite.org/mmap.html

> The operating system must have a unified buffer cache in order for the memory-mapped I/O extension to work correctly, especially in situations where two processes are accessing the same database file and one process is using memory-mapped I/O while the other is not. Not all operating systems have a unified buffer cache. In some operating systems that claim to have a unified buffer cache, the implementation is buggy and can lead to corrupt databases.

Sqlite is otherwise rock solid and won't lose data as easily