| HN Mirror

You can have lots of buckets, but each one typically still contains many datasets.

Think of a team doing ML, for example. They work with data all day across many different tools, each reading some inputs from S3 and writing outputs to S3. They won't create a bucket for every output, that's not practical. So they write to a single bucket with outputs organized under prefixes.

Buckets are more of an administrative boundary (IAM, cost, replication) than a data organization unit. So even with more buckets, the dataset abstraction is still missing - there's no good native way to track what a prefix represents, who created it, whether it's still accessed, how much it costs, etc.