Hacker News new | ask | show | jobs
by amluto 302 days ago
What do you mean “transactional”? Do you mean that a reader never sees a state in Iceberg that is not consistent with a state that could have been seen in a successful transaction in Postgres? If so, that seems fairly straightforward: have the CDC process read from Postgres in a transaction and write to Iceberg in a transaction. (And never do anything in the Postgres transaction that could cause it to fail to commit.)

But the 3 minute thing seems somewhat immaterial to me. If I have a table with one billion rows, and I do an every-three-minute batch job that need to sync an average of one modified row to Iceberg, that job still needs write the correct deletion record to Iceberg. If there’s no index, then either the job writes a delete-by-key or the job need to scan 1B Iceberg rows. Sure, that’s doable in 3 minutes, but it’s far from free.