|
|
|
|
|
by londons_explore
1773 days ago
|
|
In reality, such continuous mapreduce jobs lead to unchangeable code and versioning nightmares. Imagine you want to change part of your pipelines logic. Now either all data needs to be reprocessed (expensive, depends on you having retained past data, will your low latency continuous pipeline keep running while the backlog is cleared, is the code really idempotent or will a rerun lead to half the records failing to be reprocessed?). Or you need to not reprocess old data (now there is inconsistency in historical records, what do you do if you make a bad release which just outputs zeros?). In any real organisation, you'll need both approaches. And it'll end up a mess with versions of code and versions of data. Now some customer comes along and demands a GDPR deletion of their session records and you have no way to even find all the versions of all the copies of the records let alone delete them and make everything else consistent... |
|