Hacker News new | ask | show | jobs
by dundun 3764 days ago
You certainly can use a db for an event source. This article does a really good job of explaining how: http://www.confluent.io/blog/turning-the-database-inside-out...

As mentioned in the post, we've pushed Kafka to at least 700,000 events per second. We have room to push it to much more, but stay in tune for post 2 and 3 to see what we're doing instead.

1 comments

That is actually the post that got me into event sourcing / streams. As far as user analytic type events go this makes complete sense to me. What I haven't been able to discern is whether its useful to use this architecture on a much much smaller scale for things that may not be user events.

I love the thought of throwing everything into a stream and populating the read models, analytics, search index, etc with the data. However, for example if you had a CMS / ecom for a smaller organization, should the admin actions also be events? If you have an event source db, they would have to be, and you get all the benefits outlined from the article.

At what point do you decide what to put in the stream and what to build without? Are there events that should never be in a stream? Those are the questions I have been researching but I haven't found a lot of resources or discussions around making these decisions.

My current thought process is you use a relational db like postgres with json support to go from hobby / early startup to traction where you would need to start being concerned with scaling. At that point you switch to kafka or related hosted tools.

As far the data you put into the stream, I would think it could be everything if you treated all data as immutable even admin actions? Only thing that seems up in the air is transactions.

That is as far as I got though. I don't work with a company that has that kind of scale to use this, but I'd like to start working with it.

I built an CRM/CMS application where every single controller action call is event sourced.

The whole application lives in memory as a single object aggregate, which gets rebuilt on startup. I started off with writing json to the file system, moved into compressing and appending to a log file, and moved into using Azure cloud tables.

It's awesomely fast to respond to requests (15ms), and to add new features, but you do get interesting new problems, e.g. along the way I had to:

- come up with a way of migrating events (as my storage formats changed as I improved my frameworks) - find a good way to do fast full-text-search against in memory objects as I had no SQL or ElasticSearch infrastructure (ended up using Linq against in-memory Lucene RamDirectories) - deal with concurrency issues in a fairly novel manner(as all users are acting against a single in-memory)

I'm hoping this architecture will start to become more popular - I think we are in need of a framework equivalent to Rails to take it mainstream.

That is very interesting. I am guessing this is a closed source application? Did you do something along the lines of CQRS (Command, Query part) or just write directly the event source? At what point did appending to log file stop working which caused the switch to the cloud (or was that for unrelated reasons)?

I am also hoping it will become more popular as the pros seem to vastly outweigh the cons. But I think you are right about the framework. From my research it seems to be medium to large enterprises that would typically be best suited to using and developing something like kafka, and those enterprises typically would not open source their applications. So I definitely think a framework from a company who is using it as scale would be huge.

Until then, I suppose I will keep reading up and learning all I can and figure out how to implement this on a much smaller scale.

Cloud storage was just used so I didn't have to manage backups myself.

I absolutely didn't separate command and query - the commands themselves are actions which execute against the domain model, and that domain is used to build responses.

My project is here: [Sourcery](https://github.com/mcintyre321/Sourcery) but I think a more mature project you might like to look into is [OrigioDB](http://origodb.com/).

Another thing that gets tricky is making your application deterministic - any calls to the current time, random number or guid generatiom, or to 3rd party services, have to be recorded and replayable in the correct order for when you reconstruct your application instance. This can get tricky if you refactor your application or change its logic later.

It's worth reading up on Prevalence/MemoryImage, and looking into NEventStore also.