Hacker News new | ask | show | jobs
by johnisgood 2496 days ago
> SQLite is extremely reliable and durable for large amounts of data (up to 140TB). It is considered one of the most well-engineered and well-tested software solutions today, with 711x more test code than implementation code.

I keep seeing this statement. Why is it considered one of the most well-engineered software?

7 comments

About the well-tested bit: Per it's own documentation, SQLite has a massive test suite. [*Not all of] the test suite is actually open source though, so the overlap between commenters selling you on how well tested SQLite is and those that have seen how the sausage is made is probably [close to] zero.

However, pointing this or any of the other practical shortcomings of SQLite out on hacker news is blasphemy and will invariably get you downvoted into oblivion by people who (apparently) never ran into them.

Part of the SQLite test cases are open source (the logic tests), but indeed unfortunately the biggest part are not. You can purchase a license for these tests though, so saying nobody has seen them is likely incorrect.

Either way, the fact that it is DO-178B certified for use on aircraft should tell you something about the reliability of it. It is certainly more reliable than most other software, OSS or otherwise.

I am curious what practical problems you encountered with SQLite. Could you perhaps expand on that?

What are some other shortcomings of SQLite? I'm genuinely curious even though the below probably sounds like some text from the ~Rust~ Sqlite evangelism strike force.

With the default settings which I semi-affectionately refer to as paranoid mode, an untuned database can start to have worse performance after getting 500,000+ records going. Then things like indexes, RediSQL, and WAL mode start being more necessary rather than just best-practices.

But if you set your pragmas correctly and so on, SQLite scales up just fine. I haven't done a large scale Sqlite base simply due to caution around needing to use a 'real' database in production like SQL server or Postgres, Maria etc. Sqlite is excellent for ephemeral databases to be created, seeded with test data, run tests against, and deleted in repeatable automated testing.

Based on the link to the Expensify article, it sounds like Sqlite can scale up even better than Sql server under some circumstances.

But I have barely tried using it in production because of that aforementioned caution. What are some pitfalls to watch out for?

the elephant in the room with SQLite is that they refuse to support most forms of ALTER TABLE, that is, to be able to change the structure of a schema.

Their rationalization for this is that the way SQLite stores data, this is more efficiently performed by simply creating a new table and copying all the rows from the old table into the new one, and that one would want to batch all the table changes together rather than emitting individual ALTER statemnts.

from the POV of people who make tools like my own Alembic Migrations, this is an annoyingly insufficient answer because the logistics of recreating table schemas and copying data over is much more complicated than just emitting an ALTER directive. I'd like if SQLite had at least the ability to be extended to support a third party "ALTER" plugin that would under the hood run the intricacies of copying the tables around, rather than pushing this out to the tool creators. It doesn't really matter in most cases, for the use cases used by SQLite, that ALTER would be inefficient.

instead, my users bugged me for years to solve this problem and I have to maintain this thing https://alembic.sqlalchemy.org/en/latest/batch.html which I mostly hate completely.

SQLite's typing model is also very idiosyncratic and is based on a naming convention approach, which I think most users of SQLite don't understand very well, because it works in a completely strange way based on looking for substrings inside of the completely arbitrary names you can assign to types. I can create a column with the datatype ELEPHINT and that is a legitimate datatype which will store integers. there's also the "INTEGER PRIMARY KEY " / "INTEGER PRIMARY KEY AUTOINCREMENT" silliness but that's a relatively mild poor API compared to things MySQL does all over the place, I suppose.

It's hard to point those out without coming across as a hater, and because they might seem so obvious. Trying...:

With the way transactions and durability work in SQLite, a normal setup will not do more than on the order of ~100 transactions per second without adding additional layers of complexity. Which is completely fine for it's intended usecase of being an embedded, lightweight database and of course is not due to some bad engineering decisions, but due to inherent tradeoffs in how transactions are handled.

So basically it's all fine until you try to use SQLite for something for which it is not a good fit, like a large volume of inserts. Other solutions exist that are conceptually better equipped for this usecase out of the box (Postgres/MySQL/Elastic/etc). Not a shortcoming in SQLite per-se, but rather a practical shortcoming that comes up when you try to use it for the wrong job.

Case in point is TFA which takes the theoretical SQLite database size (140TB) limit and runs with claiming that is the useful limit on how much data you can store in SQLite. LOL! That value is when SQLite page IDs start overflowing and not an estimate on how much data can be usefully handled in a single SQLite database.

SQLite is, by the nature of what it is, not a good fit for big data volumes. Try loading 100TB of data into SQLite and try to run even a single query. Even assuming that you have a hyper-fast SSD, a single query will take days to complete! But it still get's mentioned as a potential solution for that problem every so often. In this post even...

So you could consider those practical shortcomings or ill-advised usage. It depends on the definition/perspective I guess. At any rate, SQLite, even if generally being excellent, is not the panacea as which it sometimes gets sold here on HN. No database is, there are just too many tradeoffs involved.

Why are portions of the test suite closed source?
They charge money for it...

Note that I'm not saying all of the test suite should be open source. It's clearly a valid/cool business model and SQLite is, for a lot of use cases, an excellent piece of software that I have often used myself. The criticism was honestly more directed towards the cherry-picking of facts that sometimes happens in discussions on hacker news, especially those related to SQLite.

Not a direct answer but it's certainly the most used, which correlates highly with getting a lot of dev attention :)

https://sqlite.org/mostdeployed.html

That completely depends on the chosen metric/definition though. For example, you could consider the most used database the one that handles the most queries per day. Not so clear who is the winner now. The claim that SQLite is one of the 5 most widely deployed pieces of software worldwide is completely unsubstantiated and most likely untrue once you count in things like ICU, the linux kernel, etc.
> ICU, the linux kernel, etc.

It seems unlikely that either of these are more widely deployed than SQLite, especially the Linux kernel.

Probably the vast majority of deployments of the Linux kernel include SQLite. Add in all the non-Linux devices (e.g. the billions of iOS and Windows devices) and that's probably far more than the number of Linux installations without SQLite.

I don't know much about ICU, and perhaps you're right that it's one of the most used bits of software. But I can certainly imagine embedded devices that never communicate with users and wouldn't need ICU but do need to store data.

My point was less about arguing which is more popular, but more that the same claim can equally be made for at least hundreds if not thousands of other software packages. Still, SQLite stands out to me as the only package I can recall that consistently puts this message all over the documentation front and center. Which is definitely still a useful signal, even if we both agree that it is, in fact, widely used.
Are you deliberately misreading the linked page?

  Most Widely Deployed and Used Database Engine
That last part is rather important.
No, I'm referring to the second section.

> MOST WIDELY DEPLOYED SOFTWARE MODULE OF ANY KIND? -- SQLite is probably one of the top five most deployed software modules of any description. Other libraries with similar reach include: zlib... blah blah blah

It stops just short of claiming it's the most deployed software in the world...

Barring an actual study with data, I don't think their claim is that far fetched.
A small (?) part of the answer is likely to be found in the massive amount of testing: https://sqlite.org/testing.html
I would like not to have to deal with SQLITE_BUSY errors for once. It even throws when trying to obtain a connection. It got so bad I had to put a mutex around obtaining a sqlite connection.
It's possible you are using it wrong.
The less flippant explanation is that SQLite can only handle a single writer at any time and when you try to access it with two concurrent writers (or a concurrent reader and writer in some modes) it will by default return a "BUSY" error instead of blocking.

So, if you're were getting unexpected "BUSY" erorrs than, yes, you would be using it incorrectly. However, AIUI, you are always expected to see some amount of BUSY errors during normal, concurrent operation and have to deal with them explicitly. So the fact that you're seeing BUSY errors alone doesn't mean you're doing anything wrong...

To use SQLite correctly from multiple processes, you have to do one of two things:

  - Add explicit code to retry on BUSY errors everywhere you do SQL queries
  - Serialize all access the database, e.g. by using a mutex
GP appears to have chosen the second option.
Hi, I'm the author of this library. This is a pretty common opinion, the SQLite code base is rigorously well tested for performance and correctness.

See: https://www.sqlite.org/testing.html https://www.sqlite.org/hirely.html

It's well thought through and commented. Read for example the commented design decisions of https://github.com/sqlite/sqlite/blob/master/src/pager.c or os_unix.c
https://sqlite.org/about.html

The bit about testing.