| I did in the original comment. We have over 200 monolith applications each accessing overlapping schemas of data with their own sets of stored procedures, views, and direct queries. To migrate a portion of that data out into it's own database requires, generally, refactoring a large subset of the 200 monolith apps to no longer get all the data in one query, but rather a portion of the data with the query and the rest of the data with a new service. Sharding the data is equally difficult because even tracing who is writing the data is spread from one side of the system to the next. We've tried to do that trough an elaborate system of views, but as you can imagine, those are too slow and cover too much data for some critical applications so they end up breaking the shard. That, in and of itself, introduces additional complexity with the evolution of the products. Couple that with the fact that even with these solutions, getting a large portion of the organization is not on board with these solutions (why can't we JUST buy more hardware? Get JUST bigger databases?) and these efforts end up being sabotaged from the beginning because not everyone thinks it's a good idea (And if you think you are different, I suggest just looking at the rest of the comments here in HN that provide 20 different solutions to the problem some of which are "why can't you just buy more hardware?") But, to add to all of this, we also just have organizational deficiencies that have really harmed these efforts. Including things like a bunch of random scripts checked into who knows where that are apparently mission critical and reading/writing across the entire database. General for things like "the application isn't doing the right thing, so this cron job run every Wednesday will go in and fix things up" Quiet literally 1000s of those scripts have been written. This isn't to say we've been 100% unsuccessful at splitting some of the data into it's own server. But, it's a long and hard slog. |
This hits pretty hard right now, after reading this whole discussion.
When there is a galaxy with countless star systems of data its good to have locality owners of data who publish for their usage as domain leaders, and build a system that makes subscription and access grants frictionless.