|
|
|
|
|
by oconnore
303 days ago
|
|
This is a lot of fuss when you can get a batch update to stay within a few minutes of latency. You only have this problem if you are very insistent on both (1) very near real-time, and (2) Iceberg. And you can't go down this path if you require transactional queries. I think most people who need very near real-time queries also tend to need them to be transactional. The use case where you can accept inconsistent reads but something will break if you're 3 minutes out of date is very rare. |
|
But the 3 minute thing seems somewhat immaterial to me. If I have a table with one billion rows, and I do an every-three-minute batch job that need to sync an average of one modified row to Iceberg, that job still needs write the correct deletion record to Iceberg. If there’s no index, then either the job writes a delete-by-key or the job need to scan 1B Iceberg rows. Sure, that’s doable in 3 minutes, but it’s far from free.