Hacker News new | ask | show | jobs
by lobster_johnson 3374 days ago
The difference may seem subtle, but I'd argue that it is a whole other paradigm. It's one of those things that you either get, or you don't, but it might take some time to fully appreciate.

First of all, we're not an RDBMS, and don't pretend to be. I love the relational model, but there's a long-standing impedance mismatch between it and web apps that I won't go into here. There are clearly pros and cons. Our data store isn't intended as a replacement for classical relational OLTP RDBMS workflows.

If you let all apps share a single RDBMS, you're inevitably going to be tempted to put app-specific stuff in your database. This one app needs a queue-like mechanism, this other app needs some kind of atomic counter support, etc. You may even create completely app-specific tables. How do you compartmentalize anything? How do you prevent different versions of apps to stick to the same strict schema? How do you incrementally upgrade your schemas without taking down all apps? How do you create denormalized changefeeds that encompass the data of all apps? How do you institute systemwide policies like role-based ACLs, without writing a layer in stored-procedures and triggers that everything goes through? Etc. There are tons of things that are difficult to do with SQL, even with stored procedures.

I would argue that if you go down that route, you'll inevitably reinvent the "central data store pattern", but poorly.

2 comments

Fowler refers to this as the "Integration Database" pattern, and advises against it: https://martinfowler.com/bliki/IntegrationDatabase.html

The issue with a centralized data store is that your services are coupled together by the schemas of the objects that they share with other services. This means you can't refactor the persistence layer of your service without affecting other services.

All that said, a single source of truth does do away with distributed transactions, so I can see the appeal.

He seems to come at it from a slightly different angle, and I can see how his scenario isn't a good idea.

It's worth pointing out that you do have the same challenge in a siloed scenario, but the "bounded contexts" are separated by the applications themselves, which no chance of tight coupling because there's no way to tightly couple anything. In the silo version, apps can still point at each other's data (e.g. reference an ID in another app), there's just no way of guaranteeing that the data is consistent.

The coupling challenge is solved by design -- by avoiding designing yourself into tight couplings.

For example, let's say you desire every object to have an "owner", pointing at the user that "owns" the object. So you define a schema for User, and then every object points to its owner User. But now all apps are tightly coupled together.

In our apps, we typically don't intertwine schemas like that unless there's a clear sense of cross-cutting. An "owner" field would probably point to an object within the app's own schema: A "todoapp.Project" object can point its "owner" field at a "todoapp.User", whereas a "musicapp.PlaylistItem" can point to a "musicapp.User".

(Sometimes you do have clear cross-cutting concerns. An example is a scheduled job to analyze text. The job object contains the ID of the document to analyze. The job object is of type "jobapp.Job". The "document_id" field can point to any object in the store. The job doesn't care what the document is -- all it cares about is that it has fields containing text that can be analyzed. So there's no tight coupling of schemas at all, only of data.)

However... I have played with the idea of a "data interface" concept. Like a Java or Go interface, it would be a type that expresses an abstract thing. So for example, todoapp could define an interface "User" that says it must have a name and an email address. Now in the schema for todoapp.TodoItem you declare the "owner" field as type "User". But it's an interface, not a concrete type. So now we can assign anything that "complies with" the interface. If todoapp.User has "name" and "email", we can assign that to the owner, and if musicapp.User also has "name" and "email" with the right types, it is also compatible. But I can't assign, say, accountingsystem.User because it has "firstName", "lastName" and "email", which are not compatible.

I think the point was not so much about a RDBMS, but from an architectural point of view, you have a central data thingy that similar to a central RDBMS data thingy in that all parts have to point at the central thingy for their data needs. Not so much about the pros and cons of RDBMS