Hacker News new | ask | show | jobs
by fishnchips 757 days ago
> since it uses lock files on S3 versus a separate DyanamoDB + S3 combo

This is disturbing because S3 does not give you guarantees required to implement real locking.

1 comments

https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea... guarantees that a client's lockfile can always be seen by other clients immediately (which didn't used to be true). If every client backs off and retries after a race, is that enough?
I think not, actually. There would still be cases where a race is not detected. I can think of the following sequence: A checks - no lock, B checks - no lock, A writes - success, A reads - match, success, B writes - success, B reads - match, success. A and B both think they now hold the lock.

For locking to work properly you'd need to have a conditional write that would fail if some prerequisite was not met. GCP offers that operation, S3 AFAIK does not.

I'm no expert but from a quick glance at https://www.pulumi.com/docs/concepts/state/#using-a-self-man... it looks like this might work:

  client A lists s3://bucket/prefix/.pulumi/locks/, sees nothing

  client B lists s3://bucket/prefix/.pulumi/locks/, sees nothing

  client A creates s3://bucket/prefix/.pulumi/locks/unique1.json

  client A lists s3://bucket/prefix/.pulumi/locks/, only sees unique1.json, and proceeds

  client B creates s3://bucket/prefix/.pulumi/locks/unique2.json

  client B lists s3://bucket/prefix/.pulumi/locks/ and sees both unique1.json and unique2.json

  client B assumes it lost a race, deletes s3://bucket/prefix/.pulumi/locks/unique2.json, and retries
There's another mode where both clients pessimistically retry, but fuzzing a retry delay could eventually choose a winner randomly.
In this case you have the opposite issue, with no-one actually guaranteed to get a lock even though nothing is holding one. Fuzzed retries may work in practice but theoretically speaking this is a flawed algorithm.
Hm, I can sort of imagine a way to use lockfile names to claim a random position in a queue of pending changes, but I don't know if anyone has been worried enough to do that. In practice Pulumi seems to give up instead of retrying: https://github.com/pulumi/docs/issues/11679