Hacker News new | ask | show | jobs
by teacpde 3375 days ago
The users/organizations app example sounds like the app is self-contained anyway, I don't see much difference between the app having its own data store and the centralized data store service, it's just which service to call for upstream. What would you gain by moving from the app's own store to the central store service and eliminating the app?
1 comments

It doesn't just eliminate that app. It generally means no app needs any CRUD except to encapsulate business logic.

Secondly, all data operations can now be expressed using the one, canonical data store API, with its rich support for queries, joins, fine-grained patching, changefeeds, permissions, etc. Every little microservice doesn't need to reinvent its own REST API.

For example: The users/org app has a way to list all users, list all organizations, list all memberships in an organization, etc. Every app needs to provide all the necessary routes into the data in a RESTFul way:

    /organizations              # All orgs
    /organizations/123          # One org
    /organizations/123/members  # Members of one org
    /organizations/123/members?status=pending  # Members of one org that have pending invites
    /users/42                   # One user
    /users/42/organizations     # One user's memberships
    etc.
A client that wants to query this app must first pick which silo to access, then invoke these APIs individually. The way that the querying is done is silo-specific, and the verbs only provide the access patterns the app thinks you want. What if you want to filter invites not just by status, but also by time? Every app must reinvent every permutation of possible access patterns. REST is pretty exhausting that way. GraphQL is a huge improvement, but doesn't really fix the silo problem.

With our new store, a client just invokes:

    /query?q=*[is "orgapp.member" &&
      organization._ref == "123"
      && status == "pending"]
(Yes, we did invent our own query language. We think it was necessary and not too crazy.)

Or indeed:

    /watch?q=*[is "orgapp.organization"]
Now the client gets a stream of new/updated/deleted organizations as they happen.
First, I think when you talk about all the things you've built into the canonical data store (CDS), why can't some of these be decomposed services in their own right? Permissions would be a valuable service to decouple from CDS, for example.

Second, what are the constraints of CDS? How much data can I pack into a single object? silo? How does bad behavior on the part of one caller affect another? What if CDS just doesn't work for a new service you're building?

I do appreciate that your company has invested in providing data storage as a service for yourselves, which I think is a much better idea than having each team rolling their own persistence. However, I think people would be very interested in how you've made sure that CDS isn't a SPOF for all of your data, as well as what kinds of things it isn't good at.

EDIT: I would also point out that there is a difference between having a single CDS and having StorageaaS that vends CDS's.

Those are good and important questions.

Our old "1.0" store architecture did in fact decompose things into multiple services. It has a separate ACL microservice that every microservice had to consult in order to perform permission checks. That was a really bad, stupid bottleneck.

For our new architecture, we decided to move things into a single integrated, opinionated package that's operationally simpler to deploy and run and reason about. It's also highly focused and intended for composition: The permission system, for example, is intentionally kept simple to avoid it blooming into some kind of all-encompassing rule engine; it only cares about data access, and doesn't even have things like IP ACLs or predicate-based conditionals. The idea is that if you need to build something complicated, you would generate ACLs programmatically, and use callbacks to implement policies outside of the store (the "comments only editable for 5 minutes" is an example of this), and maybe someday we'll move the entire permission system into a plugin so you can replace it with something else.

It's also important to note that the store isn't the Data Store To End All Data Stores. It covers a fairly broad range of use cases (documents, entity graphs, configuration, analytics), but it's not ideal for all use cases. There are plenty of use cases where you'll want some kind of SQL database.