Hacker News new | ask | show | jobs
by ComNik 4059 days ago
Yes, I fully recognize the problem with double-writing. I will definitely try out Bottled Water. I was also thinking about replacing Kafka with a much simpler, lower-throughput system (because we are lightyears from LinkedIn's requirements).

Two reasons why I can't just use postgres (I'd love to): 1.) Kafka (or whatever queue we settle on) will be used for logs and metrics as well, data that doesnt flow through postgres.

2.) Postgres stores the data-model of my business-domain, at the lowest, normalized level. But derived data-stores are inherently denormalized and I want to be able to use them without talking back to my source-of-truth all the time. So currently I'm passing DTOs to Kafka, just like I would to any API request. This data is not easily available at the postgres-level.

I'm not yet sure on the right abstraction level for events. It seems very natural to have them contain information that I would send to clients directly.

1 comments

So what's your "source of truth"?

We have an application that might be similar. It receives analytics events from frontends. It uses (currently) RabbitMQ to distribute it to multiple "sinks", including InfluxDB, ElasticSearch and websockets; the main sink is one that stores the events as flat files (one JSON hash per line) in S3. That's what we consider our master data.

For all application-data events I consider postgres to be the ground-truth. That is somewhat unfortunate, because one can't easily place a queue in front of the database. For metrics and logs, the Kafka topic itself (which is persisted similiar to your flat files) would become the master. The use-case is pretty similiar.

Might it be feasible to have something like postgres work with an external WAL? That would solve the problem I guess, as well as leave us with a single "persistent" system.