Hacker News new | ask | show | jobs
by stepchowfun 1589 days ago
Thank you! This is the first explanation that I think fully explains what I was confused about. So essentially the prefix is just the first N bytes of the object's name, where N is a per-bucket number that S3 automatically decides and adjusts for you. And it has nothing to do with delimiters.

I find the S3 documentation and API to be really confusing about this. For example, when listing objects, you get to specify a "prefix". But this seems to be not directly related to the automatically-determined prefix length based on your access patterns. And [1] says things like "There are no limits to the number of prefixes in a bucket.", which makes no sense to me given that the prefix length is something that S3 decides under the hood for you. Like, how do you even know how many prefixes your bucket has?

[1] https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimi...

2 comments

The sharding key is an implementation detail, so you're not supposed to care about it too much.
That's true now. Used to be the case that they'd recommend random or high-entropy parts of the keys go at the beginning to avoid overloading a shard as you described above.

From [0]:

> This S3 request rate performance increase removes any previous guidance to randomize object prefixes to achieve faster performance. That means you can now use logical or sequential naming patterns in S3 object naming without any performance implications. This improvement is now available in all AWS Regions. For more information, visit the Amazon S3 Developer Guide.

[0]: https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3...

Indeed, and unfortunately my mind will forever work this way.
It is related, in the sense both “prefixes” are a substring match anchored at the start of the object name. They’re just not the same mechanism.