Hacker News new | ask | show | jobs
by mbrt 577 days ago
I think it's now much easier to achieve than a year ago. The critical one is conditional writes on new objects, because otherwise you can't safely create transaction logs in the presence of timeouts. This is not enough though.

My approach on S3 would be to ensure to modify the ETag of an object whenever other transactions looking at it must be blocked. This makes it easier to use conditional reads (https://docs.aws.amazon.com/AmazonS3/latest/userguide/condit...) on COPY or GET operations.

For write, I would use PUT on a temporary staging area and then conditional COPY + DELETE afterward. This is certainly slower than GCS, but I think it should work.

Locking without modifying the object is the part that needs some optimization though.

1 comments

And I see more possibilities now that https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3... is available. It will get easier and easier to build serverless data lakes, streaming, queues.