A lot of people who believe only one app (or one language) accesses their org's datastore are mistaken. You have to take extreme measures to prevent ad hoc uses from popping up.
1. If you are doing anything interesting, people are going to ask questions about what you are doing, and the best way to answer those questions is going to be by querying your database.
2. One day you might want to rewrite some of your service/s, split them into microservice/s, etc. At that point, there will be a minimum of two services talking to your datastore: the legacy service and whatever you're replacing it with. I suspect any alternative to this arrangement will be an even worse idea, e.g. taking a deliberate outage to perform a likely-irreversible migration.
> One day you might want to rewrite some of your service/s, split them into microservice/s, etc. At that point, there will be a minimum of two services talking to your datastore.
You should not do this. It removes almost all of the benefits of extracting things into a separate service (services should own their data and the only means of accessing it should be via their APIs). That's not utopian; that's one of the main reasons you do a service extraction in the first place.
Right, so let’s suppose you already segmented the data to two different backing datastores, and your monolith is now connecting to both of them instead of just the one. Now you can do the service migration, at which point you still run into the situation I’m discussing.
Cutovers are hard, to be sure. Ideally they should also be short (the time time a service undergoing mitosis spends talking to the old and new locations should be measured in days or hours or less).
Don't choose general data access patterns for the infrequent occurrence of cutover. Cutover is when you break a few rules and then immediately stop doing so. Build for everyday access patterns instead (which should be through the API of whatever owns the data--SQL is a powerful language and a really shitty API).
Of course.
But surely you don't let anyone access your API, and you put it behind another API, right? Just in case you need to change that first API without breaking all the users.
Never even tell you have one, else the founder will pat on the back of one of your most junior dev and ask if he can give access to the db to that other team who needs to make money :D
I'm not the GP, but yes, absolutely. There are plenty of things that make this less than awful:
- The existence of tools that allow structured access to multiple APIs (GraphQL is a nice middle ground between "YOLO any queries you want" and "you only get row-by-row access exposed by the web APIs").
- The existence of data on multiple internal data stores. Analytics folks usually are not prepared to engage with the complexity of data being stored across handfuls or more of different stores with different schemas. The owner of the application knows how to join that stuff better than they do.
- Building intermediate/denormalized stores isn't frowned upon just because analytics shouldn't run ad hoc queries on the main production DBs. Expose change streams or bulk ("too much" data) endpoints and make it easy to load their results into a reporting system, which can be raw SQL. It's not redundant; if you don't do this, the following conversation starts to happen often: Q: "I'm running raw analytics queries on production and it's not quite working, can we just make $substantial_schema_change so my report works/is fast?" A: "No, we explicitly chose not to structure the DB/index/whatever like that because it seriously fucks up a real user access pattern."
Forcing analytics to go through the API doesn’t actually reduce load on the production DB, it just increases load on the API itself. Step 1 should probably be a dedicated read replica and step 2 should probably be an ETL process.
Ding ding ding. Dedicated read replica and an ETL gets you to a point where queries don't bring down prod. If you have an analyst org running wild making bad decisions about data that they think says things it doesn't -- that's probably a good sign that it's time for a dedicated data engineering team, and potentially a BI flavored data science team as well.
Why is this the case?
1. If you are doing anything interesting, people are going to ask questions about what you are doing, and the best way to answer those questions is going to be by querying your database.
2. One day you might want to rewrite some of your service/s, split them into microservice/s, etc. At that point, there will be a minimum of two services talking to your datastore: the legacy service and whatever you're replacing it with. I suspect any alternative to this arrangement will be an even worse idea, e.g. taking a deliberate outage to perform a likely-irreversible migration.