Hacker News new | ask | show | jobs
by krasznahorkai 897 days ago
For Data Warehousing purposes it's nice if source systems use soft deletion for easy detection of deleted rows since a certain point in time. I don't think the use-case is limited to undoing user deletions.

If there is no soft deletion or audit table available, the alternative is to either 1) Select all the source primary keys to find out which no longer exist in the source but do exist in the data warehouse 2) If there is no primary key available, select the whole dateset for full row comparison

Since in corporate settings data warehouses load at least once a day, you can imagine the amounts of data being moved around needlessly.

1 comments

For data warehousing, you usually have a ledger of events? Such that you would just record that a record with a given identifier was created/modified/deleted/etc. Right?
Yes. In essence what most businesses do is create a record for any given state, so it enables analysts to see when something was created/modified/deleted indeed.

Most business applications contain only the actual state of data, so by loading the data warehouse these changes can be saved and analyzed.

The simplest implementation is that each record gets a start_datetime and end_datetime so you can see the changes over time or figure out when something was deleted. Other techniques like data vault enable businesses to get a complete historic view of any given point in time.