Hacker News new | ask | show | jobs
by HiJon89 2028 days ago
We used to use S3 for Maven artifact storage. This is mostly an append-only workload, however Maven updates maven-metadata.xml files in place. These files contain info about what versions exist, when they were updated, what the latest snapshot version is, etc. We would see issues where a Maven build publishes to S3, and then a downstream build would read an out-of-date maven-metadata.xml and blow up. Or worse, it could silently use an older build and cause a nasty surprise when you deploy. It only happened a small percentage of the time, but when you’re doing tens of thousands of builds per day it ends up happening every day.

We switched to GCS for our Maven artifacts and the problem went away.

1 comments

To be clear: the "blowing up" would occur when a client observed the new maven-metadata.xml file, but old ("does-not-exist") records for the newly uploaded artifact, correct?

With this update, ordering the metadata update after the artifact upload means this failure is now impossible.

For some context, S3 used to provide read-after-write consistency on new objects, but only if you didn't do a GET or HEAD to the key before it was created. So this access pattern is all good:

  PUT new-key 200
  GET new-key 200
However, this access pattern would be unreliable:

  GET new-key 404
  PUT new-key 200
  GET new-key 200/404
This last GET could return a 200, or it could return a cached 404. Unfortunately, Maven uses this access pattern when publishing maven-metadata.xml files, because it needs to know whether it should update an existing maven-metadata.xml file or create one if it doesn't exist yet. So when publishing a new version, it does something like this:

  GET com.example/example-artifact/1.0-SNAPSHOT/maven-metadata.xml 404
  PUT com.example/example-artifact/1.0-SNAPSHOT/maven-metadata.xml 200
And then downstream builds would try to resolve version 1.0-SNAPSHOT, and the first thing Maven does is fetch the maven-metadata.xml:

  GET com.example/example-artifact/1.0-SNAPSHOT/maven-metadata.xml 404
So when publishing a new version, you could get a cached 404 and fail loudly. However, when updating an existing maven-metadata.xml file you could silently read an old maven-metadata.xml, and end up using an out-of-date artifact (which is even more concerning). Here's what one of the maven-metadata.xml files looks like: https://oss.sonatype.org/content/repositories/snapshots/org/...

Because updating an existing object in S3 didn't have read-after-write consistency, we could have a publishing flow that looked like:

  GET com.example/example-artifact/1.0-SNAPSHOT/maven-metadata.xml 200 (v1)
  PUT com.example/example-artifact/1.0-SNAPSHOT/maven-metadata.xml 200 (v2)
And then downstream builds would fetch the maven-metadata.xml:

  GET com.example/example-artifact/1.0-SNAPSHOT/maven-metadata.xml 200 (v1)
So downstream builds could read a stale maven-metadata.xml file, which results in silently using an out-of-date artifact.

We ended up just switching to GCS because it was relatively straight-forward and gave us the consistency guarantees we want.