| The poor man's version of Materialize that I implemented is the following: Step 1) Find all "paths" between tables * Use the postgres information schema to get all relations
* Use npm library graph-cycles to see if there are any graph cycles. If so.. some relations go on a blacklist.
* Use npm library topopsort to sort the graph
* Traverse the graph and find all possible paths from and to tables
* Generate SQL queries to look up "affected" other tables. Input: table name + id Output: list of table names + ids Step 2) Track changes in database using a transactional outbox * Create outbox table fields: id, timestamp, event, table_name, table id, jsonb payload
* After running migrations have a script that ensures every table has triggers on insert,update,delete that would insert a row on each of these events in the outbox Step 3) Compute ( You want to do reporting for a certain table ( target table ) which has relationships with other tables and for this table you want a "materialized" view ) * Have a script on a loop that takes all the outbox entries since last time processed
* Use the queries from step1 to find out which ids of "target table" are affected
* Then only materialize / compute those ids
* Store in Elasticsearch ( in our case ) This is not a solution if you are Facebook scale. But as a small SAAS company with not too many transactions this works brilliantly.
And with more tenants you can just scale this up by sharding the inbox. Bonus points: have elasticsearch (ingest node) and a postgres replica on the same host as the "compute" script. So if you have a lot of queries calculating the "dependencies" you get better performance. sorry for my terrible explanation |