| >> Convincing a whole generation of programmers that distributed lock are a feasible solution. > I too hate this. Not just because the edge cases exist, but also because of the related property: it makes the system very hard to reason about. I think this is a huge problem with the way we’re developing software now. Distributed systems are extremely difficult for a lot of reasons, yet it’s often or first choice when developing even small systems! At $COMPANY we have hundreds of lambdas, DocumentDB (btw, that is hell in case you’re considering it) and other cloud storage and queuing components. On call and bugs basically are quests in finding some corner case race condition/timing problem, read after write assumption etc. I’m ashamed to say, we have reads wrapped in retry loops everywhere. The whole thing could have been a Rails app with a fraction of the team size and a massive increase in reliability and easier to reason about/better time delivering features. You could say we’re doing it wrong, and you’d probably be partly right for sure, but I’ve done consulting for a decade at dozens of other places and it always seems like this. |
The older I get, the more I think this is a result of Conway's law and that a lot of this architectural cruft stems from designing systems around communication boundaries rather than things that make technical sense.
Monolithic apps like Rails only happen under a single team or teams that are so tightly coupled people wonder whether they should just merge.
Distributed apps are very loosely coupled, so it's what you would expect to get from two teams that are far apart on the org chart.
Anecdotally, it mirrors what I've seen in practice. Closely related teams trust each other and are willing to make a monolith under an assumption that their partner team won't make it a mess. Distantly related teams play games around ensuring that their portion is loosely coupled enough that it can have its own due dates, reliability, etc.
Queues are the king of distantly coupled systems. A team's part of a queue-based app can be declared "done" before the rest of it is even stood up. "We're dumping stuff into the queue, they just need to consume it" or the inverse "we're consuming, they just need to produce". Both sides of the queue are basically blind to each other. That's not to say that all queues are bad, but I have seen a fair few queues that existed basically just to create an ownership boundary.
I once saw an app that did bidirectional RPC over message queues because one team didn't believe the other could/would do retries, on an app that handled single digit QPS. It still boggles my mind that they thought it was easier to invent a paradigm to match responses to requests than it was to remind the other team to do retries, or write them a library with retries built in, or just participate in bleeping code reviews.