Hacker News new | ask | show | jobs
by hinkley 1427 days ago
I have an adjacent problem, and I haven't been able to find anyone who has a fix for me.

One perfectly reasonable use case for a read replica of a database is a bastion server. Database + web server on a machine that is firewalled both from the internet and from the business network. With read only access there is a much smaller blast radius if someone manages to compromise the machine.

The problem is that every single replication implementation I've seen expects the replicas to phone home to the master copy, not for the master copy to know of the replicas and stream updates to them. This means that your bastion machine needs to be able to reach into your LAN, which defeats half the point.

The most important question is, "what options exist to support this?" but I think the bigger question is why do we treat replicas as if they are full peers of the system of record when so often not only are they not - mechanically or philosophically - and in some cases couldn't be even if we wanted to? (eg, a database without multi-master support).

5 comments

That's an interesting idea. I had plans to introduce "candidates" [1] (e.g. nodes that could become the primary) but I like the idea of reversing the communication and connecting from primary to replica. I added an issue to the LiteFS project to track it. Thanks!

[1] https://github.com/superfly/litefs/issues/16

[2] https://github.com/superfly/litefs/issues/24

> The problem is that every single replication implementation I've seen expects the replicas to phone home to the master copy, not for the master copy to know of the replicas and stream updates to them. This means that your bastion machine needs to be able to reach into your LAN, which defeats half the point.

You can set up a PostgreSQL replica to be driven purely off of archive logs. It does not need direct access to the source database as it can pull the archive files via read only to a third location (e.g. file server or S3) that gets pushed by the source database server. The catch is that it will only be updated when an WAL file is pushed which can be driven either by size (automatically on an "active" database) or time (every N seconds or minutes). If you're fine with potentially being a minute behind the source, you can easily set this up.

Look into reverse SSH tunnelling. SSH from primary to secondary, which then connects back to the primary through the already-established SSH connection.
That's the closest I've been able to come up with, but it does have the problem that anything local can typically connect to that tunnel. In the bastion situation we generally don't assume that the machine is not compromised. Otherwise why did we put it outside of the firewall?

To be fair, there are a number of ways a hostile endpoint can screw with another server even just by screwing around with TCP protocol behavior, so perhaps I'm putting too fine a point on it.

> a database without multi-master support

I believe Cassandra does not have a Master\follower architecture; it's following a ring based structure.

Yeah, Cassandra uses a Distributed Hash Table with Consistent Hashing.

When a new node added, it takes responsibility for a segment of a ring. When node removed, it's segment get redistributed.

It's however in no way a drop-in replacement for RDBMS and requires a careful planning around application read and writes patterns that is very different from your typical RDBMS. Definitely, can't be used in this scenario - every node needs to be able to access every other node and client must be able to access at least one node.

Try litestream I think it’s a push system.