| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by curryst 1975 days ago

You're interweaving several different issues here.

> Why fumble around with synchronization? 99% of the data in big datasets doesn't change. This doesn't even have to be "log-based", we just need to be able to ship the old, stable data and treat it almost like "cold storage".

This is not a feature of SQL, this is a feature of the database. Also, this sounds exactly like doing full-table replication to get the "old" data and then turning on log-based replication. You can do key-based replication if you really want to avoid log-based, but it's generally just a less efficient version of log-based replication.

> Why is there a single point of entry into the data? You have to use the one database cluster to access the one database and the one set of tables. Why can't we expose that same data in multiple ways, using multiple pieces of software, on multiple endpoints?

You can. Postgres supports both Perl and Python extensions that run in the RDBMS process, iirc. Very few people use them because running in the RDBMS process means that you can break the RDBMS process in really bad ways, and it is very difficult to gain any benefits over just running a separate process that communicates over SQL.

So if you consider other processes that communicate with the database and then show views of that over other protocols, that describes most of the backend apps in the world.

There's also stuff like Presto[1] that allows you to run queries distributed over multiple databases, multiple types of databases, etc, etc, etc. In that case, conceptually, Presto is "the database" and all the records you refer to are remote.

1: https://prestodb.io/

1 comments

0xbadcafebee 1975 days ago

> This is not a feature of SQL, this is a feature of the database

Yet they always seem tied together eh? Somehow the conventions are stuck together, and that then affects how our systems work.

> Postgres supports both Perl and Python extensions that run in the RDBMS process

But I'm talking about not having to use the RDBMS process. If I have a text file on the disk, I can use a million different programs to access it. I don't have to run one program through another program just to open, read, write, and close the file with any program. Why don't we design our databases to work this way?

> Very few people use them because running in the RDBMS process means that you can break the RDBMS process in really bad ways

Yes, it does sound bad. That's why I'd prefer an indirect method rather than having to wedge access through the RDBMS

> So if you consider other processes that communicate with the database and then show views of that over other protocols, that describes most of the backend apps in the world.

Yep! We architect entire systems-of-systems just because the model for our data management in an RDBMS is too rigid. We're building spaceships to get to the grocery store because we haven't yet figured out roads and cars.