Hacker News new | ask | show | jobs
by l0k3ndr 1802 days ago
I have been working on rewriting a monolith into individual services during past year at work - and what we finally implemented for consistency across different databases of services was that we keep a signal queue - where whenever we encounter an inconsistency, a doc is pushed containing the info and we have a corrective service always reading from that queue and doing the necessary updates to tables. We made a heirarchy of source-of-truthness and we use that to decide what to do with inconsistent data. Though, users sometimes have to see a error messagge "We had an error and we are fixing it in few XXX time" based on current queue load - but it has been working fine mostly for us.
2 comments

… is that less operational overhead than running a single process on a server?
Why was that change made? From my understanding users are constantly seeing outages now where they see an error message and have to wait. Were there more outages before? How was the previous system?