Hacker News new | ask | show | jobs
by whydoineedthis 1878 days ago
I'm confused...did you fix the caching issue in S3 or not?

The article seems to explain why there is a caching issue, and that's understandable, but it also reads as if you wanted to fix it. I would think the headliner and bold font if it was actually fixed.

For those curious, the problem is that S3 is "eventually consistent", which is normally not a problem. But consider a scenario where you store a config file on S3, update that config file, and redeploy your app. The way things are today you can (and yes, sometimes do) get a cached version. So now there would be uncertainty of what was actually released. Even worse, some of your redeployed apps could get the new config and others the old config.

Personally, I would be happy if there was simply an extra fee for cache-busting the S3 objects on demand. That would prevent folks from abusing it but also give the option when needed.

5 comments

Yes, see my December 2020 post at https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea... :

"Effective immediately, all S3 GET, PUT, and LIST operations, as well as operations that change object tags, ACLs, or metadata, are now strongly consistent. What you write is what you will read, and the results of a LIST will be an accurate reflection of what’s in the bucket. This applies to all existing and new S3 objects, works in all regions, and is available to you at no extra charge! There’s no impact on performance, you can update an object hundreds of times per second if you’d like, and there are no global dependencies."

Thanks for the link, it made the change being talked about clearer. However, I still don't understand how it was achieved. The explanation in the link appears truncated - lots of talk about the problem, then something about a cache and that's it. Is there an alternate link that talks about the mechanics of the change?
Oh, awesome, I missed that!
It was fixed in December of 2020. Announcement blog post: https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea...
This is a general problem in all distributed systems, not just when pulling configuration from S3.

Let's assume you had strong consistency in S3. If your app is distributed (tens, hundreds, or thousands of instances running) then all instances are not going to update at the same time, atomically.

You still need to design flexibility into your app to handle the case where they are not all running the same config (or software) version at the same time.

Thus, once you've built a distributed system that is able to handle a phased rollout of software/config versions (and rollback), then having cache inconsistency in S3 is no big deal.

If you really need atomic updates across a distributed system then you're looking at more expensive solutions, like DynamoDB (which does offer consistent reads), or other distributed caches.

The deeper in your stack you fix the consistency problem, the simpler the rest of your system needs to be. If you use S3 as a canonical store for some use case, that's pretty deep in the stack.

> Thus, once you've built a distributed system that is able to handle a phased rollout of software/config versions (and rollback), then having cache inconsistency in S3 is no big deal.

But this would also mean you can't use S3 as your source of truth for config, which is precisely what a lot of people want to do.

What I need is that when I make a call to a service, it gives back consistent results. Ergo, when the app does do a rolling deploy, it will get the right config on startup, not some random version.

It looks like it does exactly that now, it just wasn't clear from the article.

In that example, do you not see using S3 for that purpose as trying to use the wrong tool for the task at hand. Using AWS SSM parameter store [0] seems to me that it would be a tool designed to fit that purpose nicely.

[0] https://docs.aws.amazon.com/systems-manager/latest/userguide...

Complex config files suck in paramstore. Also, I've used this for mobile app configs that are pulled from s3, so paramstore wouldn't be an option.
It is supposedly fixed.

"After a successful write of a new object, or an overwrite or delete of an existing object, any subsequent read request immediately receives the latest version of the object."

https://aws.amazon.com/s3/consistency/