|
|
|
|
|
by andrewmutz
1353 days ago
|
|
Kafka Connect can do all this for you if you configure it properly. You would use a postgres "source" connector called Debezium that tracks all changes via postgres replication. All row changes then flow in realtime to Kafka topics. Keeping the data updated in real time in elastic search is also another off-the-shelf Kafka Connector (a "sink" connector) |
|
You could also for instance create a script and use a postgresql logical replication connection ( just like Debezium ) stream the changes into elasticsearch. Without having a full Kafka connect setup. And all the training an maintenance that comes with it.
What I am describing is, before storing the data in Elasticsearch computing the data in a materialized format. So it's more efficient for Elasticsearch to work with. And no longer having any needs for joins.
So instead of recomputing the materialized data every time in it's entirety you want to be more smart about it. The root table of the materialized data depends on maybe 5 other tables. So if data in those 5 other tables change you need to know if they have a relationship with a row in the "root table". And then only re-materialize those rows.
Materialize does this by having it's own SQL language where you define your materialized view. Which compiles to clever algorithms and uses an execution engine to get to this result.
What I am doing is just having a lookup graph + queries to see what tables, id entries are invalidated. And I re-materialize using normal sql and some extra processing using a nodejs script to make it more optimal for Elasticsearch.
It's not as fancy. But it works and does the job.