Hacker News new | ask | show | jobs
by TheLudd 2784 days ago
I do event sourcing in my company and we had a look at this when we started out 4 years ago. What I don't understand is why build a database? Why not build something in the application layer that uses your fav database as a storage mechanism instead? Aren't the existing databases more mature and better to use?
2 comments

There are a lot of non-trivial aspects to scaling event sourcing systems. Particularly if you want atomicity or asynchronous two-phase committing. Having implemented them ~5 different ways for different services at my startup, I’d definitely welcome a reliable DB-level abstraction.
Also putting that abstraction in the app layer allows developers to break it. Either by accident, or as a temporary hack that never gets fixed, or a temporary hack that has unforeseen side effects.
Sorry for my inexperience in this kind of implementation but wouldn't an sql transaction achieve something similar? And with table partitioning (at least on postgres) you should be able to go quite far, instead of neediness a new db (and a lot of related stuff to study)
No because in a distributed system, you can't rely on SQL transactions. i.e. if you need to make one API request to start the transaction and a second API request to commit it/rollback - you can't hold the SQL transaction open between those.
I've used a MySQL table `event_store` with fields `uuid`, `playhead`, `payload` and `recorded_on`; works fine.
Good to know. Was thinking of this for myself in a simple usecase.
How did you manage horizontal scaling with this approach?
If you need total system order then that will always be your bottleneck, although you can make it very fast by scoping it to just a sequence number generator and doing the actual work in separate processes.

Otherwise, most event sourcing uses different "streams" of events for different application functions, so you can shard by stream in whatever way works for you.

You could shard based on uuid, so that each shard has its set of objects that it manages.

The easiest way would be to cast uuid as a 64bit unsigned int, then mod by the number of shards. If the number of shards is dynamic, then use consistent hashing.