|
|
|
|
|
by zhousun
395 days ago
|
|
DataFile(parquet) is not enough for table with update/delete, (they are part of iceberg "metadata").
for CDC from OLTP use-cases, the pattern involves rapidly marking rows as deleted/ insert new rows and optimizing small files. This is required for minutes-latency replication. And for second latency replication, it is more involving, you actually need to build layer on top of iceberg to track pk/ apply deletion. |
|