|
|
|
|
|
by zhousun
467 days ago
|
|
The only datastack iceberg (or lakehouse) will never replace is OLTP systems, for high-concurrency updates optimistic concurrency control & object store is simply a no go. Iceberg out-of-the-box is "NOT" good at streaming use cases, unlike formats like Hudi or Paimon, the table format does not have the concept of merge/ index. However, the beauty of iceberg is it is very unopinionated, so it is indeed possible to design an engine to stream write to iceberg. As far as I know this is how engines like Upsolver was implemented:
1. Have in-memory buffer to track incoming rows before flushing a version to iceberg (every 10s to a few minutes).
2. Build Indexing structure to write position deletes/ deletion vector instead of equality deletes.
3. The writer will all try to merge small files and optimize the table. And stay tuned, we at https://www.mooncake.dev/ are working on a solution to mirror a postgres table to iceberg, and keep them always up-to-date. |
|