| I can say we're one of the companies that have successfully embraced event-driven design. We're Vaticle and we're not a microservice shop - rather, we're building a database software called TypeDB. The internals are quite event-driven mainly realised with the actor model and event-loop concurrency. It has allowed us to scale mainly in two ways: maximising parallelism with respect to CPU, and doing other works while waiting for an RPC call to return. Event-driven architecture by nature is more parallel and efficient, but comes with a weaker consistency guarantee when it comes to the ordering of events coming from multiple parallel sources. In my experience, people tend to fall prey to these pitfalls, and ended up resorting to inappropriate workaround such as global locks and ad-hoc retry mechanism. These are most commonly done when trying to aggregate works coming from concurrent producers or when needing to handle communication failures. In fact, communication failures and downtimes are the most prominent problem in microservice particularly when you need your data to be inserted into multiple data sources in an atomic way. This is an inherent issue in distributed systems and you have to think what's the atomic unit of data that you wish to insert, and design your system based on this hard constraint. Making the operations atomic, idempotent or revertable are some of the solutions you may want to investigate, but the moral of the story, is that you need to make sure these additional complexities are justified. For us as a company, we decided on the event-driven architecture after knowing not just the benefit, but also the cost that I've outlined above. For simpler applications that don't need to be a) real-time and b) handle crazy amount of loads, think small internal applications, small business ecommerce website, I would resort back to good old non-event-driven system since it's the more pragmatic option. I've seen several companies building an event-driven architecture even when they know there's no way they would need to scale beyond serving several thousands of request per hour in the next two years. I think they would've been better off with a simpler, synchronous model. |