Hacker News new | ask | show | jobs
by bambax 487 days ago
It kind of does have to be one file, because foreign keys can't be enforced across files.

For a simple example, if you have one db file per user, as others recommend, how do you check that a username or an email is unique? Do you maintain a "global" db for fields with unique constraints? But then when those fields are updated you need to manage this at the application level, and atomicity can't really be guaranteed... Or is there another way?

6 comments

The application I'm building (I haven't launched yet) has one database per customer that tracks the things the customer does in the application, plus a single database that tracks the metadata about all of the customers. The metadata changes very slowly (each customer will probably make at most one update per year), so there will be (almost) no write contention on the metadata database. Each customer has an id (a random 64-bit integer) which is used (in hexadecimal) as the filename for the individual customer database.

The total addressable market for my application is about 2,500 customers, and they won't use it at all in most weeks. I'm aware that this usage pattern is very uncommon, so I don't recommend what I'm doing as a general solution.

> The total addressable market for my application is about 2,500 customers, and they won't use it at all in most weeks. I'm aware that this usage pattern is very uncommon

Is it even?

I'm with you on this. I think this is a widely desirable pattern. If it's uncommon it's because of how poorly traditional dbms's address it.

I suspect as server-side sqlite matures, this pattern will explode in popularity.

I've done pretty much that, but without SQLite as the origin of that system predated SQLites first release, with millions of users. It works, and it scales. Since then I've increasingly preferred this pattern when it's been suitable, and as SQLite has grown in maturity, it increasingly is suitable.
The pattern I've used most often is to maintain a db/table of meta records which handles identity and transaction. Then, for each metadata row there is a SQLite database instance.

This is useful for storing instances of very complex type systems (e.g., ones with circular dependencies) and managing customer tenants.

Keep user information in one database, keep user _data_ in another.

It all depends on your app, really. For some apps the above won't make sense, but maybe it would make sense to keep one db per org, or something like that.

You can also use a different but similar tool e.g. use sqlite to keep user information, use duckdb to keep user data. I find this combo very effective since sqlite is best as fast btree based indexed operations and duckdb is best at multiple row based aggregate computations (get me user X vs get me all users with property Y).
For the one DB file per user question, I'll take a stab at a solution:

* Global store of user identifiers mapped to user IDs. Identifiers can change but the underlying user ID is immutable.

* One DB per user ID.

The per-user DB doesn't really need to know the user identifier, and at auth time you don't need to access the business logic, so cross-DB updates should not be a problem.

In the general case, similar logic applies: you can't partition just anywhere, but where you can partition is pretty well defined.

> Do you maintain a "global" db for fields with unique constraints?

Yes.

> But then when those fields are updated you need to manage this at the application level

Yep. BTW, that's not really so different from a traditional db. You're doing the same thing, just in a different way, and one you probably aren't used to, but it's equally valid.

In a traditional dbms you express this constraint in application code using the database's native DDL, loaded into to the db, and executed by the db server whenever a relevant data change is made.

With sqlite you'd express this constraint in application code using practically any language you want, loaded to your app execution environment, and executed by your app whenever a relevant data change is made.

With a traditional dbms you can get away with a bit less of an application data access layer because the built-in DDL can handle some things. But it can't handle that much, so you really want that application level layer anyway.

> and atomicity can't really be guaranteed

Right. That becomes a concern of the application (could be the data access layer, though, which I think you want whether we're talking about traditional db or sqlite).

As with many horizontal scaling designs we're taking about (1) choosing divisions to minimize the possibilities for inconsistencies; (2) eventual consistency.

In your specific example, I'd question whether that global data -- user name and email -- need to be in the per-user db at all. Links between databases might use uuids that don't change, which limits the tricky cases.

It most definitely does not. Yes, you will lose foreign keys across silos. But this will usually be one association, like a user id for example.

You are also thinking of this defensively, "x value cannot exist here if it doesn't exist in y store". Just think how you'd access this, and you'll realize at worst this creates a void of data which your app likely handles anyway.

If there are numerous associations required across these silos. You don't have the right silos yet.