Hacker News new | ask | show | jobs
by Timja 1319 days ago
My prediction: SQLite will keep gaining popularity.

Especially among pragmatic software builders who run their own business and do not work for the man. A demographic that I expect to grow.

Talking about SQLite: Is there any downside to partitioning an SQLite db into multiple files?

For example one of my systems has a table 'details' which is not vital for the system to work. It's just a nice to have, to have data in this table. And it is pretty big, growing fast.

When I copy the DB over to another system, I don't need that table. So it would be nice to have like primary.db and secondary.db. With 'details' in secondary.db. Any downside to this approach? Are JOINS slower across two files than across two tables in the same file?

9 comments

SQLite has seen runaway popularity on HN lately and I bought into the hype too for a while but when I look under the hood, the 3rd party backup and replication stories just seem janky, tedious and not yet mature. It's the kind of thing where a misconfiguration could wipe out everything and/or waste you hours of time.

>Especially among pragmatic software builders who run their own business and do not work for the man.

That's the perfect use case for a SaaS database. Administering a database adds zero business value and you'd be doing it to save at most $50 a month.

> SQLite has seen runaway popularity on HN lately and I bought into the hype too for a while but when I look under the hood, the 3rd party backup and replication stories just seem janky, tedious and not yet mature. It's the kind of thing where a misconfiguration could wipe out everything and/or waste you hours of time.

If you treat it as a database outside of your application code, yes. Its "database replication" tools are far behind.

But that's "using it wrong". Outside of the application code using it, a sqlite database should be treated as a file, that's the whole magic of it.

Backup and replication tools for files are great and mature, far more mature than for most database. Something as simple as rsync already covers 99% of use case you need for it.

If you need "live replication across multiple servers" or something like that, you're completely out of the scope of the what sqlite is made for.

I also find SQLite to be a poor solution for a backend database.

In particular - it makes it incredibly frustrating to manage multiple instances accessing it, and has some very strict limitations around how the underlying FS is mounted.

SQLite is an incredible tool - but the right place for it is in a deployed client application (where - seriously - it's a first class project and is an incredible joy). It's not really designed to be your web db.

That's absolutely not true, these solutions have all kinds of costs in terms of training, maintenance, and overall system complexity.
Priory art warning: every shack that rents out a /home/$customerid slice running phpmyadmin already is on the seller side of the SaaS database market. That market is just not very interesting.
> When I copy the DB over to another system, I don't need that table. So it would be nice to have like primary.db and secondary.db. With 'details' in secondary.db. Any downside to this approach? Are JOINS slower across two files than across two tables in the same file?

I'm in the middle of refactoring my personal project such that "shared" data is in one database, and "personal" data is in a separate database; the idea being that every user will have a separate SQLite "connection", with their own "personal" data ATTACHed. I had reasonably extensive functional testing before the refactor, and after the refactor I didn't have any issues from a functional perspective.

Potential advantages:

- Each user can download their own "personal" database whenever they want

- This is essentially a form of "sharding", which should go a long way towards mitigating the "single writer" bottleneck; as the "shared data" will change much less frequently than the "personal" data. It should also make it fairly straightforward to distribute the workload across multiple servers / regions, should my project ever get that big.

Haven't done any performance testing yet.

Main issues I've encountered so far:

- Foreign key constraints across the databases is missing; that's just a reduction in safety, however.

- Golang's "automatic connection management" doesn't play well with SQLite's "ATTACH" command: it expect to automatically open new connections, but the secondary connections won't have the ATTACHed databases. This is solvable, but something to watch out for.

As implied, I'm still in the middle of changing things over, so it's early days; but so far things seem positive.

The very problem of SQLite: Single user only. Although SQLite does have WAL but it still doesn't allow you to do concurrent write unless you want to see file corruption.

This means SQLite is very much locked to things that works with one specific purpose and almost nothing else. Sure, you can be read-only, but you have to run alongside the app in the specific node, too.

Another problem (although without solving the single user mindset this wouldn't be a problem at all) is high availability. You want to make sure that your database won't get lost do you.

Things like Litestream [1] attempts to solve the SQLite backup problem it by continuously saving the database state and pack t up to S3-compatibles or file system but its just half the story. You want to make sure your operation not stopping. This is where HA comes in to save you from an emergency fixup when you are enjoying your holiday.

It doesn't mean that nobody tried to solve both these problems though. Ahem, introducing rqlite [2]. Although my own experience is not very great because the memory usage is quite high and does not fit my need (because the embedded device only has 512MB on it, and every byte counts, sorry), I guess that's the price to pay if you want to turn a non-multiuser, non-concurrently acccess database into one...Another honorable mention would be LiteFS [3] but I haven't used it yet so I have no say on it.

[1]: https://litestream.io/

[2]: https://github.com/rqlite/rqlite

[3]: https://github.com/superfly/litefs

Many of us here writing web apps for enterprises can use SQLite with WAL with no issues.

Number of concurrent users range from tens and rarely hit hundreds. SQLite can handle that kind of traffic without any issues.

why would you do all this work when postgres sorted this all out over a decade ago?
Setting up Postgres is a PITA compared to SQLite. It comes bundled with python these days. Obviously it's going to be a trade off as to which one causes you more pain.
No docker access? postgres images are available in seconds.
So the solution is pulling down additional gigabytes of images and runtime to run the database?
Yes. It works great.
postgres alpine is 90MB.

I can really recommend it, using a declarative docker-compose.yml file and then docker-compose command.

agreed, running pg locally is a pain. I use a cloud postgres instance (even for local dev). They're dirt cheap and it's not worth the hassle of working with a local pg.
What OS are you on? Running postgres on Linux is as trivial as it gets, and for Windows they have a nice installer. Or just use docker.
Does the docker have users set up? I seem to remember that being a pain point in the initial setup.
I love Postgres.app on mac, makes running locally a breeze. Running Postgres fast on the cloud is what I always struggled with (without paying enormous sums)
>> agreed, running pg locally is a pain

If you have a single process accessing the database, use SQLite. If not, use Postgres.

which cloud do you use for local dev?
aws, gcp, digital ocean and supabase all have pretty good free tiers
There is also Cloudflare D1, which is now in public alpha.
> Any downside to this approach?

None that I can think of unless you need foreign key constraints between both.

> Are JOINS slower across two files than across two tables in the same file?

I was recently debugging as slow JOIN with ATTACH'ed databases and the query plan looked the same as when both tables were in the same database. I don't think it makes any difference.

But in these situations, the solution is measuring and benchmarking for your use case.

I keep reaching for SQLite and it keeps working. Although I've been needing a better review of what other embedded databases I should be considering in 2022. I tried Genji[1] recently and tore it out as it wasn't doing ORDER BY with multiple columns.

1. https://genji.dev/

> Especially among pragmatic software builders who run their own business and do not work for the man. A demographic that I expect to grow.

From the FAQ; the are lots of caveats (especially, the last).

> Situations Where A Client/Server RDBMS May Work Better

> Client/Server Applications

> If there are many client programs sending SQL to the same database over a network, then use a client/server database engine instead of SQLite. SQLite will work over a network filesystem, but because of the latency associated with most network filesystems, performance will not be great. Also, file locking logic is buggy in many network filesystem implementations (on both Unix and Windows). If file locking does not work correctly, two or more clients might try to modify the same part of the same database at the same time, resulting in corruption. Because this problem results from bugs in the underlying filesystem implementation, there is nothing SQLite can do to prevent it.

> A good rule of thumb is to avoid using SQLite in situations where the same database will be accessed directly (without an intervening application server) and simultaneously from many computers over a network.

> High-volume Websites

> SQLite will normally work fine as the database backend to a website. But if the website is write-intensive or is so busy that it requires multiple servers, then consider using an enterprise-class client/server database engine instead of SQLite.

> Very large datasets

> An SQLite database is limited in size to 281 terabytes (248 bytes, 256 tibibytes). And even if it could handle larger databases, SQLite stores the entire database in a single disk file and many filesystems limit the maximum size of files to something less than this. So if you are contemplating databases of this magnitude, you would do well to consider using a client/server database engine that spreads its content across multiple disk files, and perhaps across multiple volumes.

> High Concurrency

> SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writers queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.

For me an important caveat is the typing. With all respect for the original author of SQLite -- he has done an outstanding job-- I think he underestimates the value of a good typing system. I have seen some databases that had all kinds of messy data. Back in the day MySQL was also quite loose with regards to checking data. Undoing the damage is in most cases not possible. For a business data is more important than code, so be strict up front.

I know, SQLite has added the option to enforce type checking. The authors still don´t believe in the value of it and the available types are quite limited and thus loose. I think this is something that pgsql got quite right, where you can have your domain types on the database level.

On the other hand, if you keep this as a replacement for your config file ( I thought this was the original purpose?), then yeah, you get an awesome deal. I wouldn't dare to build my business on it, just like I don´t believe in MongoDb and any untyped language for serious purposes.

As others have pointed out, there's the strict mode now which is still quite restricted (pun intended), but what you most often don't hear is that you can also use check constraints, as in

    sqlite> create table t ( id integer primary key, n integer check ( typeof( n ) = 'integer' ) );
    sqlite> insert into t ( n ) values ( 1 );
    sqlite> insert into t ( n ) values ( '1' );
    sqlite> insert into t ( n ) values ( true );
    sqlite> insert into t ( n ) values ( 'x' );
    Runtime error: CHECK constraint failed: typeof( n ) = 'integer' (19)
    sqlite> select * from t;
    ┌────┬───┐
    │ id │ n │
    ├────┼───┤
    │ 1  │ 1 │
    │ 2  │ 1 │
    │ 3  │ 1 │
    └────┴───┘
    sqlite> select ( select n from t where id = 1 ) = ( select n from t where id = 2 );
    1 // i.e. true
Check constraints do have the advantage over more classical types that additional constraints can be declared such as valid ranges for numerical types etc.
>I think this is something that pgsql got quite right

I don't think so. For example, pgsql had an array type before it got JSON, so the drivers can't automatically convert arrays that you want to insert into JSON. With my SQLite ORM, you can just insert arrays and objects and it knows to convert them automatically to JSON.

I like that SQLite just has a few primitive types. My ORM will be able to build on top of them. For example, JavaScript will soon be adding new date types (Temporal), and I will create new types for that, which will be stored as text ultimately.

SQLite has strict mode now
Which is quite limited in scope and does not allow for boolean (faux-boolean, of course) or json columns. It also affects certain operations in ways that might not be immediately obvious.

Not sure if this has received any further work since its release.

https://sqlite.org/src/wiki/StrictMode

https://sqlite.org/stricttables.html

I think I mentioned that, or I don´t understand what you mean.
>If there are many client programs sending SQL to the same database over a network

I believe this is a reference to enterprises that have different users querying the database directly with SQL that they wrote over a network to a central database.

My prediction: Database gatekeeping will continue into next year.

Lots of X is a toy database, Y is all you need for every use case, nobody really needs scalability, high-availability etc and above all else never use an ORM. Real engineers write SQL by hand.

Recently I stumbled upon BedrockDB[0] from Expensify. It is based on SQLite and has very interesting idea on HA and distributed DB.

[0] https://bedrockdb.com

I use an app with 3 different SQLite databases, but since I never have to join tables from different files, I haven't found a downside.