Hacker News new | ask | show | jobs
by sideway 1778 days ago
Thanks for your detailed answer, really appreciate it.

Two follow up questions if you don't mind me asking, even though I understand you were not on the publishing side:

1. Do you know if changes in the org structure (e.g. when uber was growing fast and - I guess - new teams/product were created and existing teams/products were split) had significant effect on the schemas that had been published since then? For example, when a service is split into two and the dataset of the original service is now distributed, what pattern have you seen working sufficiently well for not breaking everyone downstream?

2. Did you have strong guidelines on how to structure events? Were they entity-based with each message carrying a snapshot of the state of the entities or action-based describing the business logic that occurred? Maybe both?

And yes, one of the books I'm talking about is indeed Designing Data Intensive Applications and I fully agree with you that it's a fantastic piece of work.

2 comments

For 1, no example really comes to mind, but i guess there could be cases where a service went from publishing an event with all of its related data, then split into a service where that becomes more expensive to do (like that data is no longer in memory its behind the api of the old service). In some cases you can have very simple services that consume a message, make a few calls to services or databases to hydrate it with more information, then produce that message to another topic that the original consumers could switch to. More commonly though if the data model is making a drastic change where the database is being split and owned by two new services, you will have to get consumers in on the change to make sure everyone knows the semantics of the new changes.

For 2, it completely depends on the source of the trigger. The first event in a chain probably only has enough information to know that it should produce an event, usually as quickly possible, so no additional db or api fetches. So you might get something in the driver status topic that contains {driver_uuid, new_status, old_status}, then based on what downstream consumers may want to do in response to that event, you may need more info, so you may get more entity information in derived topics. Even pure-entity-based messages would have needed a trigger, so in our topics that tail databases, you may have the full row as a message along with the action that occurred like {op: insert, msg: {entity data… }}.

Thank you so much for your input on this topic, very informative answers!
I am not the author of the original message, however, I also recommend "Building microservices 2nd edition" if you are trying to answer such questions
Thanks for your recommendation, I pre-ordered it =)