Hacker News new | ask | show | jobs
by nick-keller 473 days ago
Thanks for the extremely detailed feedback. I'll try to address your (very valid) concerns:

- I don't know if you had a look at the "How does it work?" page, here I try to explain using sequence diagrams how the process is split in two: first aggregating events into root IDs and then building the final objects from those root IDs.

- Each of those two steps hit the DB but: (i) it should not be the production database but a read-only replica, (ii) those two queries are independent and can be run separately. So instead of rebuilding extraction from scratch, I decided to rely on already existing replication strategies which in essence do exactly what you suggest.

- This library is not at all concerned about transformation, this step should indeed be separated. In our production environment, we transform the high-level events that PG-Capture sends with an async worker that does not hit the DB at all, it just transforms the data it receives.

- I agree that you should not index directly what is in the DB, which is why you should transform the data I suggest in my previous point. But that data has to come from somewhere, and PG-Sync aims at making that part of the process smooth and robust.

- Regarding full-indexation, it is actually pretty straightforward: push all IDs of your root table into the store (can be streamed) and your consumer should already be building objects and publishing high-level events. The good part is that the consumer will not do one query per object but can build a lot of objects at once with a single query.

We have been using PG-Capture in production for half a year so far, but we are not yet at the scale of a few Gb per second.

Eager to have your feedback regarding those points.