Hacker News new | ask | show | jobs
by TekMol 1761 days ago
As a developer, I have to say that sqlite gives me the best experience.

Everything else pales in comparison.

Create a database?

    sqlite3 mydata.db
Where is the database?

    In the current directory
How is it structured on disk?

    It's a single file
How do I backup the DB?

    cp mydata.db /my/backups/mydata.db
Do I have to install a server?

    No
Do I have to configure anything?

    No
During setup and deployment I usually I dabble a while with the whole GRANT *.* ON localhost IDENTIFIED BY PASSWORD or something. How do I do that with sqlite?

    It just works
Do I have to close / protect any specific ports?

    No, it's just a file
Which field types should I use for ... ?

    None. It just works.
7 comments

Unless you need to do something crazy, like run two nodes of your application. But only Google needs to do that, right?
You don't even need to reach two nodes before SQLite becomes grossly inadequate. Even on a single node: SQLite's paradigm of global locks leads to poor performance when multiple threads write to the same table.

You could be a single-node 4-core $5/month VPS instance and run into this issue. SQLite requires "exclusive" access to a table to handle writes (meaning when writing, no other thread can be reading the table). Especially if your transactions start to become complex.

In contrast, MySQL and PostgreSQL allow for simultaneous reads while writes are occurring.

I believe that SQLite in Write-Ahead Log (WAL) mode does not have this issue.
The question is if simultaneous operations really speed up your application.

It is not as if a 4-core machine can do 4 times the DB work if you only allow it.

Memory access, disk access .. they all have their specific behaviour when you try to do things simultaneously. In the worst case, things will just get serialized on a lower level, even if multiple CPU cores send and/or request data simultaneously.

We don’t have to talk in abstract about this, you can benchmark these things or just look at the many ones run by others. For example, some benchmarks around some Postgres scaling improvements: https://wiki.postgresql.org/images/e/e8/FOSDEM2012-Multi-CPU....

If you don’t need anything Postgres offers, by all means stick with sqlite. But your pessimism about being stuck with operations “serialized on a lower level” is not empirically justified for most of the databases that support multi-threading.

> It is not as if a 4-core machine can do 4 times the DB work if you only allow it.

If Thread#1 goes "lock. write. unlock", then that means Thread#2, #3, and #4 all have to wait until Thread#1 is done with the write.

Even if Thread#2/#3/#4 have the data in their CPU-cache, they have to wait for Thread#1 to be complete.

--------

If Thread#1 is writing to a __hard drive__, that means Thread#2, #3, and #4 are waiting on a hard drive read, when they could have instead been reading from L3 cache.

SQLite's model scales extremely poorly in practice. You run into all sorts of problems like this.

That is why I mentioned the "worst case".

You are describing the best case. Where everything is in a cache close enough to the CPU that it is not impacted by the other CPUs data access.

I'm talking about relatively simple cases like spinning up a phpbb3 (web forum) instance. The minute you have multiple users writing comments at the same time from different apache/php instances is the minute your SQLite database starts to fall over.

Every write (aka: every comment post) is an exclusive write on SQLite. A full lock that prevents other processes from reading. (Ex: User#1 writes a comment, but User#2 hits refresh. These two items will be sequential if you use SQLite... when normally they'd be concurrent in most other databases)

------

SQLite is an exceptionally good database for many purposes. But it has its weaknesses. There's a reason why MySQL and PostgreSQL exist after all.

We've been using sqlite as our only datastore for the last 5+ years.

There are a lot of fun little tricks you learn along the way.

The most important one is to ignore all the naysayers who claim it won't scale. I can easily saturate any IO subsystem by way of SQLite.

At first you may think it's impossible, but then you start to go outside the lines a bit. You start by turning on WAL. That helps a lot... But then you want another 100x? One thing you can do is heavily abuse the fact that any SQLite db is "just a file" by opening as many as you need to in parallel. 9/10 times you can break down a typical business app into grains roughly approximating the current state of each user.

Might make regulatory compliance easier if each user is stored in their own database too. Makes removing someone's entire presence from your system 1 file deletion activity.

How do you aggregate the data and query among all those databases? Doesn't it incur huge costs? Does sqlite support this natively or are you basically treating it as text files of data per user and don't need complex queries in the first place?
Sqlite has it natively. After connecting to user1.db just execute this query:

ATTACH DATABASE user2.db AS user2;

Now you can use user2.tablename in any query just as if was part of user1.db

We don't have any use cases for querying across these datastores right now. Hypothetically, if we did need something like this it would probably be an offline ETL analysis tool. I have written a few of these and they can be incredibly performant if everything lives on the same box.
Is there a wrapper to treat a database like its an sqlite database (a file) to simplify to the same level as Sqlite? This may at least simplify some issues. Doesn't matter whether its backed by a local database running in a container or by a remote one.
The biggest database unification effort tends to be focused around ODBC compatibility from what I've seen - and the operations laid out by ODBC are quite trivial and easy to comprehend I think.
I more thought of a cli wrapper to enable simple file-like management of databases with operations like copy, list and not having to setup auth, but rather just provide the local path to the database. However, ODBC looks interesting too.

ODBC might enable the creation of such a cli wrapper in a database-agnostic way. An authentication library retrieving the correct credentials based on the local file path may make the local use as seamless as with Sqlite. To get the best from both worlds.

Yea I think that'd be an interesting approach - and ODBC takes care of all the annoying bits for you (providing you with a single unified API to interact with instead of tailoring the commands to each specific driver).
calibre uses sqlite and it doesn't support hosting the db on a network drive.

https://manual.calibre-ebook.com/faq.html#i-am-getting-error...

i guess that's a bad -dev- user experience?

You can certainly host it on a network drive if the network filesystem has the right features and behaviour.

The same goes for a local filesystem. Sqlite has certain features it requires the filesystem to have. That is independent of how that filesystem stores the data physically.

> if the network filesystem has the right features and behaviour

Which network filesystems are those?

SQLite actually works just fine on Windows files shares (and yes with multiple clients since Windows file shares do support file locks) but I wouldn’t recommend it as a remote DB/multi client solution.
Firebird is also close. With the extra capabilities that can run as server (and that is only a setup concern: the code stay same).

Also, it support more stuff (like stored procedures!) that I miss a lot on sqlite.

My only complain with firebird is that is not easy to embed into iso/android or it will my main db.

Fully agree, but that comes with tradeoffs around concurrent writes, strong typing (though I hear that's coming), some other functionality, etc.
This feels like kind of a silly comparison. SQLite can't do most of what makes modern databases difficult/valuable. As an example, if you replace your above example with a CSV file, you'd get the same output, right?
> with a CSV file, you'd get the same output, right?

Not when you have threads.

I describe the differences between sqlite and mysql, postgres, oracle etc that are the reason why I like sqlite best.

If for you there are specific differences that make you chose some other db, let us know.

I remember that back in the day getting Postgres up and running was an enormous chore - SQLite was definitely the easiest but even MySQL was a walk in the park compared to Postgres. But Postgres is simply amazing - it is entirely transparent about the metadata it's storing to assist querying and has mature tools available to help you optimize queries. The only "big" one that I haven't used is Oracle mostly because the price makes it insanely unreasonable for small companies. Out of MySQL, SQLite, MSSQL and Postgres I'd have to say my favourite is Postgres due to compelling the least number of hair-pulling frustrations and having absolutely pristine documentation - followed closely by MSSQL though MSSQL sometimes seems to go out of its way to have an incompatible dialect.