Hacker News new | ask | show | jobs
by ketralnis 1290 days ago
This is a pretty strong claim without any numbers, to be honest. You were probably already running your single database instance so being limited to a single sqlite instance isn't terribly different. This does limit you to a single app server, which unless you're CPU bound is also fine. Even if you are, you can get a lot of cores in a single instance these days.

Most things will never need to be scaled up

1 comments

AFAIK, sqlite3 doesn't support multi-core, so you can only do vertical scaling. In cloud, people rather do horizontal scaling. That being said, I ran a django based webapp for internal users and we saw core being pegged by sqlite3 when the queries were complex. However, under normal usage, it was fine for about 100 concurrent users including machine API calls.
> doesn't support multi-core

Can you expand on that? I thought it supported multiple readers, but only a single writer. They do all have to be on the same host, though.

Since you mentioned Django - I think there are some complexities where the Python driver can't really do concurrent access if using threads instead of processes, but this is due to Python/GIL limitations, not sqlite limitations.

> Python driver can't really do concurrent access if using threads instead of processes, but this is due to Python/GIL limitations

Confusingly you can't run 2 lines of Python code at the same time, but you can run 2 SQL queries (sqlite, remote server, etc) at the same time. Python threads are true pthreads, they just need to hold the GIL while running Python code. They release it when running C code or doing I/O. So you can have 2 queries running at the same time no problem, but you'll only get a single core of compute processing their results.

It is even a little better than that these days. Beginning a transaction normally locks the whole database, but with Wal mode they have added a Begin Concurrent command so there can be multiple transactions going at once. However when you get to the commit it still requires the exclusive lock. I bet after a few years even that restriction will be gone.

https://www.sqlite.org/cgi/src/doc/begin-concurrent/doc/begi...

https://sqlite.org/forum/info/2ad28cb1e0356816e7b0bd2ab458f7...

I don't think BEGIN CONCURRENT is out yet. AFAIK it's only available if you build the experimental wal2 branch yourself.

I guess I miss read the branch list, I thought the begin-concurrent-3.39 branch said it had merged to trunk but that is the line above it.
sqlite3 is a single core application, so even if you have multiple cores on a box, sqlite3 isn't going to take advantage of that. If you want sqlite3 to go faster, you need better single core box. That being said, I do believe sqlite3 allow multiple processes to read the same database file.

Our django app sits behind Gunicorn that will spin up multiple django instances, as long as Sqlite3 has WAL mode on, concurrent access (read) isn't a problem. For write, we basically use a lock.

This is confusingly worded but for onlookers: sqlite as you think of it isn't an application at all, it's a C library. There does exist a binary called sqlite3 which is an interface to that library, and that binary is (mostly, though not exclusively) single-threaded.

But unless you're using that specific tool, and you're probably not unless you yourself are typing "sqlite3" into a bash shell, you're using the library rather than this query tool. And that library interacts in a similar way to a socket connect or a fopen call. You can have multiple connections open to the same sqlite database, and those connections can operate concurrently (subject to various limitations, but "you need better single core box" is not the summary)

When you embed it in Go, you can have concurrent reads. Each HTTP handler request gets a separate goroutine, so you can definitely take advantage of multiple cores.
SQLite actually scales far better than the memes would generally tell you and that "lite" implies. It's frustrating how easily these memes spread by people just repeating what they heard once. If there’s anything HN has taught me it’s to check the docs instead of believing these ambient notions

It's more complicated than "doesn't support multi-core" which you can see in the other replies here: generally unlimited concurrent readers are allowed with single or at least finite writers. Depending on a bunch of settings (e.g. WAL) you might also get concurrent writers, multicore sorting, and some other things that can cross cores too. Those things do have their own tradeoffs.

But my actual claim is that most things don't need to be "scaled" at all. And if you do get there, and again statistically you won't, then you're definitely going to be doing some other rearchitecting anyway and moving sqlite->postgres might as well be part of that.

I never said it's "lite". I can say for certain that in my experience, when writes start to go up, such as stock market event stream coming in, sqlite3 doesn't perform as well as postgres, and scaling writes is hard in sqlite3. But I have no issue scaling Postgres for those data.

If you say, sqlite3 is good enough for most webapps, then I probably agree. But I would not use it for a lot of things that I use Postgres for such as pubsub, or really high transaction rate.

Not saying SQLite wasn't the issue here (impossible to tell based on the information provided), but I've seen a lot of webapps with highly inefficient queries because developers either didn't understand what the ORM was doing or didn't bother to optimize. In a Django app, prefetch_related can make a huge difference (or joinedload when using SQLAlchemy or better initial filtering when using raw SQL).
We used mostly raw SQL. Django ORM is good for CRUD app, but anything analytic style app, it's more painful than it should be. I'm not blaming Django, just we picked the framework for CRUD work and had to retrofitted to do analytic because of requirements.