Hacker News new | ask | show | jobs
by benbjohnson 799 days ago
LiteFS author here. I don't disagree with any points in the article but perhaps a reframing could help. I previously wrote a tool called Litestream that would do disaster recovery for a single-node SQLite server and I still think it's a great default option for people starting new projects. Unless you're doing very database-specific things, most SQL will carry over between SQLite and Postgres and MySQL, especially if you add ORMs in the mix. Pick the one that gets you writing code the fastest and you can switch down the road if you need it.

Rather than a paradigm shift or hype, I see distributed SQLite as an extension of a path that devs can go down. With Litestream, the most common complaint I got was that devs were worried that they couldn't horizontally scale with SQLite and they'd be stuck. While you probably won't hit vertical scaling limits of SQLite on most projects, it still caused concern. So LiteFS became a "next step" that a dev could take if they ever got to that point. It doesn't need to be your starting point.

As for the "hacky" solution of txid, I'm not sure why that's hacky. Your application isn't required to use it or the optional built-in proxy but it's available if it fits your application's needs. It also works for plugging legacy applications into distributed SQLite without retrofitting the code. The proposed solution of caching seems orthogonal to the discussion of distributed application data. I don't think any database provider would suggest to avoid caching when it's appropriate but there's plenty of downsides of caching. Hell, it's one of the two hardest problems in computer science.

2 comments

>most SQL will carry over between SQLite and Postgres and MySQL, especially if you add ORMs in the mix

I think this goes underappreciated, or rather the opposite is overstated.

Sure there are some edge cases that don't work the same, but most apps won't hit those.

My _biggest_ gripe with SQLite so far is the lack of column reordering like other DBs. And my simplistic understanding is that the others do it exactly the same way as you'd do it manually with SQLite - table gets _replaced_ with an identical table with the data correctly ordered and the data is shoved into the new table.

If you want a more convenient way to do column reordering (and other advanced alter table operations) in SQLite my sqlite-utils CLI tool can do this:

    sqlite-utils transform data.db mytable \
      -o id -o title -o description
That will change the order of the columns in the specified table such that id, title and description come first.

The same command can handle many other operations such as changing column types, renaming columns or assigning a new primary key.

https://sqlite-utils.datasette.io/en/stable/cli.html#transfo...

> Sure there are some edge cases that don't work the same, but most apps won't hit those.

That really depends on your modelling style. If you like things like types, SQL-side processing (eg using functions), or covering indexes, then you’ll hit issues every five minutes in sqlite.

SQLite really wants the logic (including consistency logic) in the application, just compare the list of aggregate functions in postgres versus sqlite, or consider that you have to enable FKs on a per-connection basis.

Which I guess is why ORMs help a lot: they are generally based on application-side logic and LCD database.

I'm pretty sure SQLite has covering indexes. And the relatively new strict mode should enforce at least basic types (though if you want to enforce your own rules for things like dates you're still on your own).
> I'm pretty sure SQLite has covering indexes.

I checked to be sure I had not missed it, and didn’t find anything. You have expressions and conditions, but no covering. Obviously you can kinda emulate it by adding the columns you want to cover to the key, but…

> though if you want to enforce your own rules for things like dates you're still on your own

That’s what I was talking about, having richer types, and the ability to create more (especially domains).

Strict tables provides table stakes of actually enforcing the all-of-5-types sqlite has built-in. Afaik a strict mode is something that’s still being discussed if it ever becomes reality.

SQLite has what they "call a covering index", see point 9 here: https://www.sqlite.org/optoverview.html

My impression is that this mechanism is less general than what one finds in full-fat client-server SQLite databases.

Ya if my reading is correct this is the poor man's covering index: if all the requested data is in the index key the query will not hit the table, so you can add additional fields at the end of the key to get index-only scans (at a cost, also some flexibility cost e.g. doesn't work with unique indexes).

I guess it's less of an issue in sqlite than in databases with richer datatypes in the sense that all datatypes are ordered and thus indexable.

"Obviously you can kinda emulate it by adding the columns you want to cover to the key"

Right, that seems like a good solution to me.

It’s a workaround, but it bloats the interior pages of the index with the covering data, which increases the size of the index and makes lookup less efficient (as they have to traverse more interior pages, and since there are more pages those are less likely to remain in cache).
I use SQLite in my personal projects, not professionally. I was wondering if you could elaborate on what you mean by 'consistency logic' in the context of SQLite.
Why would you want to recorder columns? SQLite reads in a whole record at a time to access any column.
Because as in structs padding slack can lead to a surprising amount of overhead.
That's not the case:

"SQLite does not pad or align columns within a row. Everything is tightly packed together using minimal space."

https://sqlite.org/forum/info/06ad7f81fea46401

One reason to reorder columns with SQLite is that if a column is usually null or has the default value, SQLite will not store the column at all if it is at the end of the row. It only saves a couple of bytes per column, but it is a reason to get these columns at the end.
AFAIK position has nothing to do with nulls, a null is a 0 byte in the header and has no payload in the row: https://www.sqlite.org/fileformat.html#record_format
Continue reading that section:

"Missing values at the end of the record are filled in using the default value for the corresponding columns defined in the table schema."

If you have a table with 5 columns and you only insert the first 3 columns (based on create table column order) because the last 2 values are null or default, SQLite will only insert 3 type bytes in the header. However, if the first column (in create table order) is the one you omit, SQLite has to include its type byte, even if the value is null.

I reorder columns all the time for neater readability of "select *" queries.
Not true.
column reordering is simple to fix with this migration script https://david.rothlis.net/declarative-schema-migration-for-s...

if you are using Zig (and like to live on the bleeding edge), you can also just use my library which includes similar script and also a simple query builder https://github.com/cztomsik/fridge?tab=readme-ov-file#migrat...

I think the bigger issue for many is that tooling, infra(provider), in-house knowledge/skill/experience as well as optimizations may differ quite a lot.

Of course, this will differ a lot between projects.

SQLites handling of dates is pretty kludgy.

It stores them as strings, so to do something like extract just the year from a date, you have to do 'CAST(substr(game_date,0,5) AS INTEGER).'

Hackish and error prone.

It is a fairly low level abstraction, but one that does not require a verbose api. There is nothing error prone or hackish about what you have written, it will work for all inputs, it is just low level. You are just used to having other people write this code for you and give you a library. With newer versions of SQLite you could also write

CAST(strftime(“%Y”, game_date)) as INTEGER

Which is somewhat higher level and less easily mistyped

That's much better, thanks. In case I ever need to do years < 1000 or > 10000 :-)

Still, having that all over a query looks ugly. SQL is can be unreadable enough as it is without all the joins/table renaming.

I just want something more readable like EXTRACT(year from date), like you can in Postgres et al.

Would also be nice if there was a native timestamp like there is in, pretty much every other database.

I'm sensitive to "feature creep" but this doesn't seem like too big of an ask.

I agree it's less obviously correct, but I bet you could add the extension to sqlite if you feel strongly about it. As an aside '%y' is documented to only work in sqlite for years >= 0000 and <= 9999, so it would behave exactly the same as the code you wrote. especially because you already didn't have to worry about years less than 1000 because the ISO8601 format used for serializing dates in sqlite normalizes them with leading zeros.

for instance `select date(-50000000000, "unixepoch");` returns `0385-07-25`

Interestingly %Y doesn't seem to handle negative dates either if you need to handle BC, so I guess that is one downside for both. This is one reason I sometimes prefer to use low level code even when it is less obviously correct with a cursory glance, because abstractions may not mean what you think they mean, or even worse, may be lying to you. At least with low level code I can reason about how it would behave under certain edge cases I might care about.