|
|
|
|
|
by snidane
1945 days ago
|
|
We are probably refering to different scenarios. When purchasing data for analytics, data providers are usually sophisticated enough to know not to modify their data history. With new ones, data delivery format can be negotiated. Data providers usually wait for a day or something worth of data to collect before validating and releasing it to customers. For integrating some OLTP database updating in real time on the other hand, yes you will need CDC. --- Most of data engineering is just incrementally adding new data to existing corpus and then running a big batch job to dedup, sort or partition. This last step surely is computationally expensive, but at least it is conceptually simple and can be solved by throwing hardware at it. The first part of incremental updates is what imo causes more troubles. |
|