|
|
|
|
|
by mbrt
567 days ago
|
|
Wow, not sure how I missed this, but I see many similarities. They were also bitten by lack of conditional writes in S3: > In Databricks service deployments, we use a separate lightweight coordination service to ensure that only one client can add a record with each log ID. The key difference is that Delta Lake implements MVCC and relies on total ordering of transaction IDs. Something I didn't want to do to avoid forced synchronization points (multiple clients need to fight for IDs). This is certainly a trade-off, because in my case you are forced to read the latest version or retry (but then you get strict serializability), while in Delta Lake you can rely on snapshot isolation, which might give you slightly stale, but consistent data and minimize retries on reads. It also seems that you can't get transactions across different tables? Another interesting tradeoff. |
|