| It depends on your definition of fully embraced. If you mean that there is no synchronous communication between services, then no, and neither does it make sense in the real-world scenarios I am aware of. However, I am an advocate of the pattern and have seen it used successfully repeatedly. The largest scale as the data lead for a product maintained by 100-200 developers and several thousand transactions per second. To answer your specific questions >handling breaking schema changes or failures in an elegant way, and keeping engineers and other data consumers happy enough? We did not allow for breaking schema changes. If there is a breaking change, it's a new event/topic. We used Kafka and every topic needed to have a compatibility scheme defined (see https://docs.confluent.io/platform/current/schema-registry/a...) to clarify what constitutes a breaking change. Even though some claim that producers and consumers can be fully decoupled, you will need to have a good idea who your consumers are and the time horizon of the data they consume. Application engineers are usually easier to keep happy than machine learning practitioners and other data consumers that want to consume events emitted over a long time period, potentially years. > As a trivial example, everybody talks about dead-letter queues but nobody really explains how to handle messages that end up in one. Dead letter queues are a tool you can use when the context demands it, applying it wholesale is likely creating too much overhead. But to provide you with a specific example. Some emitted events will be revenue impacting and depending on your setup, you actually want to use the events for financial reporting (careful! some more info later). In this specific use-case, if you can't process a record, the last thing you want to do is throw the message away. Somebody will need to have a look at these records, fix the cause and then either re-emit the records based on what you know about them from the header or fix the records in the DLQ.
So think about the guarantees you need to provide and decide whether a DLQ makes sense for your use-case. Some other thoughts and considerations. - Topics more or less directly become analytics tables. Almost creating a unified view on your application's data otherwise difficult to create. - How are the messages emitted. Are the messages emitted from the application logic? If so, what guarantees do you need? What happens if the app crashes (e.g. after a DB transaction happens and before the event was emitted). Depending on what you need, have a look at the transaction outbox pattern. |