|
|
|
|
|
by Nextgrid
812 days ago
|
|
Most applications as a whole are absolutely stateful. Individual components of them might not be (app servers are stateless with the DB/Redis containing all state), but the whole app from an external client's perspective is stateful. If we're talking about reliability/outage recovery, we're considering the application as one single unit visible from the external client's perspective - so everything including the DB (or equivalent stateful component) must be redundant. Sadly this is also where a lot of cloud-native tooling and best practices fall short. There are endless ways to run stateless workloads redundantly, but stateful/CAP-bound workloads seem to be ignored/handwaved away. I've seen my fair share of stacks that are doing the right thing when it comes to the easy/stateless parts (redundancy, infinite horizontal scalability), but everyone kinda ignores the elephant in the room which is the CAP-bound primary datastore that everything else depends on, which isn't horizontally scalable and its failover/replication behavior is ignored/misunderstood and untested, and they only get away with it because modern HW is reliable enough that its outage/failover windows are rare enough that the temporary misunderstood/unexpected/undefined behavior during those flies under the radar. |
|
And so no, most teams don’t need to worry about the hard problems you bring up.