Hacker News new | ask | show | jobs
by eric_b 3449 days ago
I detailed some of the problems I saw in another thread, but one other thing that gets challenging is the CQRS side (if you choose to keep everything async).

For example, say you fire an "address change" event. That gets sent to the event store for eventual projection in to a medium you can actually query from (realtime querying of the event store itself is the road to very bad places, I promise). So now your event is sent along, but how do you know when it has been processed/projected? What if there was a conflict with another address change event from somewhere else? How does that bubble back up?

The typical solution is to pass back some kind of "receipt token" so you can poll to see when your event is processed, then do your read from the projected database, or whatever. Of course, this can be made to work, but once you start talking edge cases and the need to support standard UX paradigms, polling for every update and handling error scenarios in this way becomes painful.

4 comments

> Of course, this can be made to work, but once you start talking edge cases and the need to support standard UX paradigms, polling for every update and handling error scenarios in this way becomes painful.

This really makes me think you—and the originators of these projects you're lambasting—are working from an incomplete understanding of how to apply CQRS and ES. If you're applying CQRS in a fully async fashion, polling itself is an antipattern. That receipt token? That's what everyone else calls an ID. You know when its been processed because the subscription you should be watching tells you its been processed.

You mentioned changing event payloads in another thread. That's another big code smell to me. In a stable, well-understood domain, your payloads don't change much. If you're applying ES to a domain that ISN'T well-understood, you need to do a LOT of discovery ahead of time, or be prepared to iterate on your data until you do. It sounds like the projects you were on failed on those accounts.

Yeah, ES is hard when you keep trying to treat it like CRUD. Its overkill when you apply it to an easy domain, and its an antipattern when you write CRUD events. So don't do those things.

I was waiting for the "you're doing it wrong" guy to show up. You win!

Yes, some of the systems used subscriptions too, which had their own set of issues.

Additionally, domains are almost never completely understood. Even if they're well understood today, things will change tomorrow. CQRS/ES in your own words is not good when requirements change. Well guess what? That's every system I've ever worked on.

If you've had success building non-trivial CQRS/ES applications I'd love to hear more specifics about how you solved all the other issues I've presented.

You didn't succeed at building a CQRS/ES system despite several attempts. Why aren't you asking "what am I doing wrong?" instead of presuming that your personal experiences are sufficient to render informed judgement?

> Additionally, domains are almost never completely understood. Even if they're well understood today, things will change tomorrow. CQRS/ES in your own words is not good when requirements change. Well guess what? That's every system I've ever worked on.

It's still possible to design individual components that don't require constant churn of their application state. Most software teams are incapable of this, and event sourcing is not for them, even in domains where it shines (like finance).

In my experience, when teams have solid leadership, you can get your software pretty close to the target the first time you build it. Minor course corrections are straightforward, if sometimes tedious. When the business experiences big pivots, much of what you've built can be reused providing it's modular and does not make assumptions about the overall system. The rest can be discarded.

That's a big departure from the topic at hand, but my point is that if your software isn't modular, event sourcing in particular will amplify the pain you feel.

I recommend that anyone thinking about doing CQRS/ES find someone who is an expert to help guide them or their team.

Maybe they've been asking since 2010 and, not having received a satisfactory solution from the experts in the field, have stripped all the projects of CQRS/ES and gone back to what works well. There comes a point in time where you stop asking and move on, and expecting them to re-ask on HN is a poor presumption on your part, leading to an uninformed judgment.
Well, there's a lot of people who have successfully deployed ES systems at both the large and small ends of scale. So one might ask, after having looked for and received some answers from people who have done this successfully, where did I misapply or misunderstand the advice?
> I was waiting for the "you're doing it wrong" guy to show up. You win!

Well, when you base an argument on a set of known antipatterns, you shouldn't feign surprise when someone points out that you're basing your argument on known antipatterns.

>Additionally, domains are almost never completely understood. Even if they're well understood today, things will change tomorrow. CQRS/ES in your own words is not good when requirements change. Well guess what? That's every system I've ever worked on.

The first point is flat out untrue. There are domains of expertise with literal centuries of knowledge and practice in them. There are many many more with decades. And many manys more with years. Startups measure knowledge in weeks and months. This is not a suitable playground for ES.

Secondly, I didn't say CQRS/ES was unsuitable when requirements change. I said it required a lot more work when the domain was not well understood—and that the work was primarily in understanding the domain.

I've used some combination of these patterns on nearly every system I've worked on for the last 7 years. That spans medical billing, ticketing, public health, the wedding industry, and for the really esoteric, voting software for college life organizations. Here are the rules I've found:

* Keep it simple. Do not try to apply ES to all areas of your software, if you apply it at all. Use it within small bounded contexts, and guard the data from other BC's. The minute you poke a hole in the BC's data store, you've guaranteed yourself headaches down the road. This means don't try to make your user model something that's ES-based unless you're building an LDAP server or similar.

* CQRS does not require ES. ES does not require CQRS.

* on-demand projections are fine for a lot of purposes, learn to tell when you're going to need a static projection. Key indicators are reporting, background use, and expense of the projection. This is not a complete list of indications.

* a projection is part of a BC. Don't go querying other BC's at runtime for their data. If its important to the projection, establish a public contract on the events from the other BC, listen to them, and store the data independently. Yes, its duplicated, that's fine. YMMV.

* do not try to back ES into an existing application, unless you're a) rebuiding an entire feature silo from scratch; b) building an entirely new feature from scratch; c) there is no C. Its tempting, I've tried it, but your best value for time is to refactor into something more modular, which is the 80/20 value of it.

* If you're going to go async, go async. build that expectation into your UI. the pain of dealing with async commands comes from figuring out how to get feedback on them. Its a command; there is no feedback. Once it validates, its done as far as the sender is concerned. A failure to fulfill the contract is itself an event, like any other that comes over your event bus. If you build in the facilities to treat it as such from the beginning, your life is much easier.

* Use uuid's for PK's, and originate them with the client whenever possible. This allows for optimistic concurrency and additional commands to be sent before receiving the results of the original command. Also, track command ids/causation ids as part of the metadata for events. Its not always useful to have, but when it is, its very useful to have.

I'm sure there's more to say, but a lot of these lessons are basically common knowledge if you're well-read on the subject. A few of them are just things I've learned the hard way—I've broken damn near every one of them at some point, with regrets. That said, you do this enough and you learn which rules can be broken and when to break them, as with any other kind of expertise.

But ES has saved my bacon more than once. I've used it to back out of a poorly designed CRUD model, report on BI questions for years past, even restore data once when a network partition created a gap of several hours with high-frequency writes. (Chalk that up as a good reason to keep your event store independent of your transactional data store.) Yes, there are headaches to it—to pretend like CRUD doesn't have different versions of those headaches is disingenuous, or simply inexperience talking.

Based on his article it looks like he's using subscriptions internally rather than polling. That's a fairly natural thing to do across an Elixir application/cluster.

In terms of conflict resolution, it seems like you'd have to clearly define a scenario where a conflict was possible. Based on the write-up, the state of an address would be based on the aggregate of the events that wrote to it. That seems like it would always lead to the last change winning.

From the write up of the system, I actually can't imagine trying to do this in anything other that Elixir/Erlang. The set of requirements and challenges to pull it off would be really complicated on just about any other platform.

Pushing read model updates back to the client using a two way communication channel is one technical solution. I want to experiment with using Phoenix channels[1] to solve this. I think that has potential for easing the UI/UX concerns. You post a command from the web front-end and subscribe to receive updates for the read model you're looking at. Domain events can drive the client notification.

[1] http://www.phoenixframework.org/docs/channels

If you can write your read model into Mongo, then you can use Meteor to build a real time interface extremely quickly; it tails the database log and dispatches updated records to subscribers practically instantly over websockets, no need for the event processing code to know about how to map to frontend queries. We use this for our production CRM, albeit for internal users. No doubt Phoenix would be more performant and support more databases, but it's nontrivial to build the record-to-subscriber reverse mapping that Meteor brings out of the box. RethinkDB was going to be the Chosen One for this use case, alas...
alas
The address thing is normally solved by the fact that you organise your commands by things that should only logically change togther. So a conflict messages won't revert irrelevant fields back to their old ones.

So your commands should not be

UpdateCustomer

They should be

UpdateCustomerAddress UpdateCustomerEmail

Etc

For the address, just take the last one. All the business logic I can think makes this ok.

If your events contain the words "Create", "Update", or "Delete", or any synonym thereof, you're modeling CRUD with events and life is always going to be more complicated than it has to be for you. The names of events are data too—make them representative of the domain.

CustomerMoved(fromAddress, toAddress) is a domain event.

CQRS by itself does not imply using ddd or es.
Yeah, fair enough, but if you're not using ES then the names of messages don't matter a whole lot because you don't have to live with them forever.

(Edit: ok, they matter some, in the way names of variables and apis matter.)

Dino Esposito describes an "historical" crud System in a series in msdn magazine https://msdn.microsoft.com/magazine/mt703431 This is basically ES with crud. Not saying ES with crud is the best example, but for data which requires Audit Trail logic it actually works fairly well.
Haven't read that article, but will check it out, thanks for the link.

My issue with audit logs in crud systems is that they're almost always at the row level, which is almost useless when you're trying to make sense of the audit log. An audit log of "operations"—i.e. command log—is far more useful, and trivial to implement when CQRS is used. I'm guessing that's what this article details...