Hacker News new | ask | show | jobs
by DaveMcMartin 536 days ago
SQLite is getting better and better. I am using it in production for a bunch of websites and never got a problem.
3 comments

It should be fine for read-only data. If you want to write, be aware that only one process can write at a time, and if you forget to set busy_timeout at the start of the connection, it defaults to zero milliseconds and you'll get an error if another process has locked the database for writing while you try to read or write it. Client-server databases tend to handle concurrent writers better.
I think people overstate this.

Yes, the sqlite concurrency model is a bad choice if you have a high degree of concurrent writes. However for many applications that simply isn't true. When it comes to websites i think people significantly overestimate the amount of concurrent writes.

This is a baseline benchmark I published of various p9x latencies for multiple readers/writers with a single SQLite database with different configurations set: https://github.com/mqudsi/sqlite-readers-writers

Even if you don’t use message passing to use one thread to perform all updates/writes, it still performs very solidly for many use cases.

Depending on the amount of write throughput you need and whether you care about latency, "concurrent writes" aren't necessarily a problem either. You can just shove them into a queue and then have a single thread pull stuff out of the queue and push it into SQLite. That still scales, up to a point.
FYI this is basically what Redis does (which does not support concurrent writes or reads afaik), and that still scales quite a lot.
No, because your readers are still blocked by writers and still randomly fail if you forgot to set busy_timeout, which is something at least one person didn't they had to do until they read this comment.
| readers are still blocked by writers

We're all using WAL mode these days, right?

No, it's indeed a very real problem. I ran into with a very small service.
You want to configure it so it has a timeout. Take turns. That’s how locks work.

The only difference between SQLite and Postgres write locking is the granularity.

It's not a trivial difference like you suggest.
I agree It’s not a trivial difference in implementation and the concurrent write performance is probably a lot worse.

But it’s talked about as if it’s a categorical limitation, the app will fail if there is concurrency. But it’s just a question of how much time will be spent locking.

A website with a 16 process pool for handling requests will be fine.

sqlite also polls for lock availability
SQLite really isn't meant to be used exactly like a hosted solution. I don't know who is advocating for this.

If you are sharing your database between processes or machines, you are probably doing "the fancy new SQLite thing" wrong.

If you need to share information contained in a SQLite database with other processes or machines, you can write application-level code to handle this concern.

Sharing between processes? Absolutely. Machines? No.

For example, LAMP stack applications could swap the M for SQlite (and I think it would have been better historically if they did).

Sharing between processes isn't impossible but this where you get all of your performance caveats.

I think it is a bit unfair to argue against SQLite on performance grounds but then not explore ways to utilize instances of it within a single process scope where it is most effective.

Sharing a single SQLite connection across all threads within a process is where you get the best performance outcomes. Everything is serialized within the SQLite provider - assuming you are using a mainstream build. So, if your storage subsystem is fast (NVMe & friends), you will achieve very good outcomes. Any utilization model that requires a file open/close operation every time you want to touch the data is a complete non-starter by comparison. The "unit of work" paradigm that hosted providers recommend is catastrophic for performance with SQLite.

One option for the "I absolutely must share this SQLite instance with 30+ services" scenario is to simply wrap it with a REST API or similar.

I don't see why performance would be significantly different between multiple threads using same sqlite db vs multiple processes on same machine. Can you explain more what you mean?
See section 5:

https://www.sqlite.org/lockingv3.html

In the single process access model, you can connect exactly once and remain in a reserved lock state the entire time.

> but this where you get all of your performance caveats.

You mean the one where it locks on write? It’s totally fine, if you wrote any cross process code yourself it’s probably going to do similar locking.

Yeah, but you're locking the whole file and if you try to open it while it's locked, sleep-polling for it to be unlocked. It's safe, but it's a minimum viable product - as they say, sqlite is a replacement for fopen, not for a fully featured database. Client/server systems have better locking.
I thought WAL mode solves this. Am I misunderstanding the docs, or this SQLite running without a write ahead log?

There are advantages and disadvantages to using WAL instead of a rollback journal. Advantages include:

    WAL is significantly faster in most scenarios.
    WAL provides more concurrency as readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently.
    Disk I/O operations tends to be more sequential using WAL.
    WAL uses many fewer fsync() operations and is thus less vulnerable to problems on systems where the fsync() system call is broken.*
Then you have to remember to enable WAL mode. And it still has its caveats, but different ones: it's possible that the WAL file can grow without bound under certain conditions.
You know it never occurred to me that there's probably a whole new generation (experience wise at least) of programmers who 1) know SQLIte is commonly used for web backends now but 2) don't know about WAL mode.

To me the concept of SQLite in these scenarios, without the WAL, is just nuts.

So configure busy_timeout, that’s what it’s for.
I've been really impressed with sqlite+litefs on fly.io. it's pretty easy to get distributed reads with single master writes.
If you never got a problem then how is it getting better?
What if we didn't know we had a problem? And now it's "ooooh it's better now".
Packs more features and improvements over the time. One can read it to see that it gets better
Perhaps it's getting faster?