Good point about double paying for storage. There's a long migration problem here to clean up. Most of my buckets have some frequently and some very infrequently accessed objects. To address this, there needs to be some sort of an active migration tool? Does R2 have this built in?
I’m not sure exactly what you mean by active migration tool, but my understanding is that they copy objects from s3 to r2 on the first request. So if you have infrequently accessed objects they won’t be copied and you won’t be double paying storage until it’s first requested.
Just an idea, maybe the first fetch from S3 should be allowed by R2 do delete the original object from S3 too, so that eventually we're only left with two mutually exclusive sets of files (and no double storage)
I thought about that too. It's could be a good solution because the challenge otherwise is going to be listing all objects in the bucket and comparing to what's in the R2 bucket, right?
- Set up SQS to populate with S3 Create events. SQS listeners just make a GET request to R2 for that file.
- Generate S3 events and populate SQS by running List operations or by abusing S3 Lifecycle Management.
- Let it process.
- Switch writes to R2.
This all assumes you can’t delete from S3 till R2 is fully ready. Depending on the application, you could switch over writes to R2 in a different step and also possibly delete the S3 file in the SQS processor.