|
|
|
|
|
by biellls
1939 days ago
|
|
How would this look like? Specifically, how do you know if something has been deleted? Do you compare the primary keys in your materialized view (the last snapshot you have of the data) with the source data to know what changed? Isn't that really hard to do if they're not in the same database? In real life most people prefer taking a full snapshot each day because they don't have good solutions to these problems in batch systems (CDC is another story). |
|
The whole point of ETL is to bring data from one database to another. The comparison of source and destination primary keys can be done in python outside of db. And should be done on entire partitions instead of individual rows. Eg. you only consider which 'day' partitions have been loaded, not which rows have been loaded.