Hacker News new | ask | show | jobs
by west0n 688 days ago
I'm curious about how rqlite's performance compares to other distributed databases developed in Go, such as CockroachDB, Vitess, and TiDB.
2 comments

It’s going to have much lower write throughput, since SQLite is single-writer and on top of that you need to do Raft consensus. TiDB and CockroachDB can handle concurrent writes easily. Cockroach runs raft per “range” of 128mb of the key space, I’m not as familiar with TiDB. Vitess is an orchestration layer over MySQL, and MySQL handles concurrent writes easily.
rqlite creator here.

That's correct, there is a write-performance hit for the reasons you say. All Raft systems will take the same hit, and SQLite is intrinsically single-writer -- nothing about rqlite changes that[2]. That said, there are approaches to increasing write-performance substantially. See [1] for much more information.

Write-performance is not the only thing to consider though (assuming one has sufficient performance in that dimension). Ease of deployment and operation are also important, and that's an area in which rqlite excels[3] (at least I think so, but I'm biased).

[1] https://rqlite.io/docs/guides/performance/

[2] https://rqlite.io/docs/faq/#rqlite-is-distributed-does-that-...

[3] https://rqlite.io/docs/faq/#why-would-i-use-this-versus-some...

Oh, I also presented some performance numbers in a presentation to a CMU a couple of years back. A little out-of-date, but gives a order-of-magnitude sense. https://youtu.be/JLlIAWjvHxM?t=2690

The biggest performance improvement since is due to the introduction of Queued Writes. See https://rqlite.io/docs/api/queued-writes/

> Ease of deployment and operation are also important, and that's an area in which rqlite excels

Amen. I've been building something appliance-like where I want to support clustering but I don't want to manage a database cluster inside the project.

Rqlite is so easy to run either stand-alone or clustered. It's a godsend.

And when people want postgres or whatever, I let them bring their own database. It's not hard to abstract a database storage layet if you plan ahead.

But if you want it to 'just work' rqlite is doing that with flying colors.

Relevant to the original inquiry — I really admire that you bring up the etcd and consul comparison right up front in the readme. For my own comprehension at least, it makes obvious the type of workloads for which you're optimizing and I appreciate that context as a past user of both of those stacks.
Maybe ETCD is a more appropriate comparison?
Depending on what you’re using these tools for. If you want a locking manager and some meta data storage to help your distributed system maintain state, etcd is better for the job than rqlite for that. It’s a better zookeeper. With etcd you can hold a lock and defer unlocking if the connection is disrupted. Rqlite is not a good option for this.
Agreed, in the sense that while rqlite has a lot in common with etcd (and Consul too -- Consul and rqlite share the same Raft implementation[1]) rqlite's primary use case is not about making it easy to build other distributed systems on top of it.

[1] https://github.com/hashicorp/raft

Every time I've looked at rqlite, it just falls short features-wise in what I would want to do with it. A single raft group does not scale horizontally, so to me rqlite is a toy rather than a tool worth using (because someone might mistake the toy as production grade software).
rqlite creator here.

That's clearly a mistaken attitude because both Consul and etcd also use a single "Raft group" and they are production-grade software.

Ruling out a piece of software simply because it doesn't "scale horizontally" (and only writes don't scale horizontally in practice) is a naive attitude.

The qualifier here is for /my/ use cases. However I couldn't recommend rqlite over better options at the level of scale that it can fill.

One of the problems is if you're working with developers, the log replication contents is the queries, instead of the sqlite WAL like in dqlite. I know this is a work around to integrate mattn/sqlite3, but it's untenable in enterprise applications where developers are going to just think "oh, I can do sqlite stuff!". This is a footgun that someone will inevitably trigger at some point if rqlite is in their infrastructure for anything substantial. In enterprise, it's plainly untenable.

Another issue is if I want to architect a system around rqlite, it wont be "consistent" with rqlite alone. The client must operate the transaction and get feedback from the system, which you can not do with an HTTP API the way you've implemented it. There was a post today where you can observe that with the jetcd library against etcd. Furthermore to this point, you can't even design a consistent system around rqlite alone because you can't use it as a locking service. If I want locks, I end up deploying etcd, consul, or zookeeper anyways.

If I had to choose a distributed database with schema support right now for a small scale operation, it would probably be yugabyte or cockroachdb. They're simply better at doing what rqlite is trying to do.

At the end of the day, the type of people needing to do data replication also need to distribute their data. They need a more robust design and better safety guarantees than rqlite can offer today. This is literally the reason one of my own projects has been in the prototyping stage for nearly 10 years now. If building a reliable database was as easy as integrating sqlite with a raft library, I would have shipped nearly 10 years ago. Unfortunately, I'm still testing non-conventional implementations to guarantee safety before I go sharing something that people are going to put their valuable data into.

To simply say I'm "ruling out a piece of software because it doesn't scale horizontally" is incorrect. The software lacks designs and features required for the audience you probably want to use it.

Hopefully you find my thoughts helpful in understanding where I'm coming from with the context I've shared.