Hacker News new | ask | show | jobs
by jkljsfdasdf 882 days ago
Embarrasing question tbh but with all the cloud-native sqlite stuff like cloudflare d1 and fly LiteFS I'm seriously thinking of switching from postgres to sqlite.

Does anyone have a compare/contrast sort of thing between the two?

6 comments

The SQLite omitted features page is a good place to start:

https://www.sqlite.org/omitted.html

and the gotchas page it links to.

It is a lot less feature rich than Postgres so there are things you will miss. Nothing like the range of types, I do not think it has a transactional DDL which is nice to have for migrations, and there are various other things like exclusion constraints and the different index types.

On the other hand SQLite may do all you want and not having to run and configure a separate server is a huge deployment advantage.

For the love of all that is holy, if you do, only use STRICT tables. By default [0], SQLite will happily accept that not only can an INTEGER column store “1234” (helpfully silently casting it first), but “abcd” can also be stored in the column as-is.

There are other horrors in the link.

[0]: https://www.sqlite.org/quirks.html

I actually kinda like the fact that whatever data you write to the table will actually be written.

I semi-regularly fix a serious data loss bug that has been fixed with an alter table query. Maybe converting VARCHAR to TEXT or INT to BIGINT... of course it doesn't really "fix" your problem, because the data has already been lost/truncated.

What's a real world situation where completely the wrong type could be written to a column? Especially in modern software with good type safety checks/etc to ensure you don't have malicious data inserted into your database? If I ever did have that happen... at least the data hasn't been lost. You can run a simple script to clean up the "horrific" data.

I don't find a DB that losslessly stores what I told it to store regardless of types worrying at all.

So in fact AFAIC the misfeature of SQLite is not that it's typeless, IMO, rather it's that it has this notion of NUMERIC affinity that's all but lossless.

E.g. SQLite has a decimal extension that allows you to work with decimal numbers represented as TEXT, and so is appropriate to handle money without rounding issues. However, if you have a column where the declared type is DECIMAL, MONEY, NUMBER, NUMERIC or whatever it will have NUMERIC affinity. Then if you store a textual decimal number to it, it will deduce it looks like a FLOAT and convert, loosing precision.

Your only solution is to use BLOB affinity (declare no type), which is what I do, most of the time.

This is precisely the issue. Databases should not guess at what you want, nor be helpful and make your query work with incorrect types specified.

Schema is rigid; that’s the point. If the input is incorrect, log an error.

It's a bit hard to take your objection very seriously when the following spits out the same SQLite as in PostgreSQL:

    CREATE TABLE tbl (x integer, y real);
    INSERT INTO tbl VALUES ('001', ' 2.5 ');
    SELECT * FROM tbl;

    1|2.5
Numeric affinity was "invented" in SQLite to make it more compatible with PostgreSQL (et al.).
> What's a real world situation where completely the wrong type could be written to a column?

* App with incorrect schema definition is deployed

* App written in TS or Python has type checks disabled

* App written in JS does math, and its fun parsing system decides that 1 + “1” == “11”

I’ve seen all of these.

> You can run a simple script to clean up the "horrific" data.

Depends on scale, and the tables / columns. If there are billions of rows, and the columns with incorrect data aren’t indexed… that’s a bad time.

For me, the biggest trade offs for sqlite are just that you need to think about how you're going to store and backup the database a lot more. Specifically in container orchestration environments like kubernetes, I think sqlite presents a couple of challenges. With MySQL, you can set up a replicated database server instance outside of the cluster that you just connect to over the network, and you can use standard MySQL tools like mysqldump to back them up. Kubernetes isn't ideal for stateful workloads so that tends to be one of the more sane solutions there.

With SQlite you need to set up a persistent volume to keep the database around between container restarts, and you need to think of a clever way to then back up that sqlite database to somewhere like S3, likely using an sqlite3 command with a VACUUM statement and then an `aws s3 cp` command, which requires AWS credentials. Overall, a lot of additional work and privileges on the application container, at least in container orchestration environments. In lieu of all that, maybe you trust your persistent volume provisioner enough to try to do an online snapshot, but that always sketches me out / I don't trust the backup enough to rely on it.

Of course you can use a public cloud sqlite service like Cloudflare D1, but I haven't used that solution enough to say if it would be flexible enough to work with, say, an on-prem application server, or if it only works with Cloudflare workers. I'm sure I could find that out in the documentation but I've exhausted my mental stamina for the day with leafing through documentation pages.

Backing up sqlite databases is straightforward. `.backup` is a command that you use in sqlite for this purpose. Since you already have a volume for the database, you can backup to that same volume (if offloading to S3/etc is too much work).
> Since you already have a volume for the database, you can backup to that same volume (if offloading to S3/etc is too much work).

I think I just accidentally didn't see this part of your reply or something last night. But backing up a database to the same place the primary one is stored is decidedly _not_ a real backup solution. Nobody should believe that offers them any of the same forms of protection that a real backup would give them. At the absolute best, you're protected from your database getting malformed by your application server. But if you lose that volume for any reason, your backups are just gone. Imagine explaining to your boss in such a scenario that the solution that was come up with was that the backups are kept on the same linux partition as the primary running database. They would fire me. I would fire me.

Yeah, my point was pretty specific to container environments, that using sqlite forces you to add a bunch of sqlite handling logic to your application. Whereas with MySQL (and other similar RDBMS’), you can have your application just worry about its own logic, and handle MySQL backups completely separately from it.
Note that sqlite `.backup` does not back up PRAGMA values. Some people use `PRAGMA user_version` for schema versioning (not a good idea, because of this trap).
It sounds great until you need a centralized billing database and then you might want to just stick with postgres rather than run two kinds of database.

Does anyone have ideas on how to solve that? Not to mention complicating migrations. Unfortunately sqlite-based product docs seem to end right before getting to the hard stuff. Or perhaps I missed them.

We use Alembic migrations with SQLite. No complaints.
Does Alembic provide a simple solution to the limitations of SQLite's alter table?
You're right. Sorry. I shouldn't post when I'm tired. This is listed as a limitation in Alembic, from memory.

Edit: I looked[0]. While it is a SQLite limitation, Alembic does seem to have a way to work around it.

[0] https://alembic.sqlalchemy.org/en/latest/batch.html#working-...

I was asking, not correcting you!

Django ORM also uses the same copy and move method but it warns you against doing it on production databases: "same copy and move method but it warns you against doing it on production databases".

That makes me reluctant to use SQLite for a use case it is otherwise well suited to: multi-tenant applications.

Alembic does not have such dire warnings, but still looks problematic with regard to constraints?

copying and pasting from a different thread:

I use SQLite/Litestream for https://extensionpay.com! Serves about 120m requests per month (most of those are cached and don't hit the db), but it's been great!

I was convinced that SQLite could be a viable db option from this great post about it called Consider SQLite: https://blog.wesleyac.com/posts/consider-sqlite

Using SQLite with Litestream helped me to launch the site quickly without having to pay for or configure/manage a db server, especially when I didn't know if the site would make any money and didn't have any personal experience with running production databases. Litestream streams to blackblaze b2 for literally $0 per month which is great. I already had a backblaze account for personal backups and it was easy to just add b2 storage. I've never had to restore from backup so far.

There's a pleasing operational simplicity in this setup — one $14 DigitalOcean droplet serves my entire app (single-threaded still!) and it's been easy to scale vertically by just upgrading the server to the next tier when I started pushing the limits of a droplet (or doing some obvious SQLite config optimizations). DigitalOcean's "premium" intel and amd droplets use NVMe drives which seem to be especially good with SQLite.

One downside of using SQLite is that there's just not as much community knowledge about using and tuning it for web applications. For example, I'm using it with SvelteKit and there's not much written online about deploying multi-threaded SvelteKit apps with SQLite. Also, not many example configs to learn from. By far the biggest performance improvement I found was turning on memory mapping for SQLite.

Happy to answer any questions you might have!

Out of curiosity, have you ever needed to restore a litestream backup?
Nope!
Cool. I'm still trying to wrap my head around how to use it responsibly: Is this a "good, old fashioned" single durable server type of deployment?

I was wondering if you had any experience with, essentially, "always restore DB during startup".

I like my systems made such that the server can die, get scaled vertically and oops data disk wiped, forget to mount volume in a container, etc. without me lifting a finger. It looks like I can do that with litestream, but I always want to hear from people who have tried it in the real world.

> Is this a "good, old fashioned" single durable server type of deployment?

Yep!

> I like my systems made such that the server can die, get scaled vertically and oops data disk wiped, forget to mount volume in a container, etc. without me lifting a finger. It looks like I can do that with litestream, but I always want to hear from people who have tried it in the real world.

I can't really help you there unfortunately, but I know fly.io is trying to offer something like that.

Oh the complexity and pain I go through in my quest for "keeping it simple."

It's fine, my server for stupid side projects is a cheap VPS with a systemd-based configuration anyway, I am just trying to figure out all the failure modes here vs "apt-get install postgresql", including, most likely, myself.

In my tests sqlite was around 10X faster than postgres.

That mean that a single cheap server is capable of going very very far in normal web workloads.

I think we often add complexity: virtualisation, cloud, separated db server, horizontal scaling when efficient and simple tech is able to go very very far.