Hacker News new | ask | show | jobs
by skybrian 43 days ago
Why do they put everything into one huge bucket? Wouldn't the best way to clean it up be to create more buckets?
1 comments

You can have lots of buckets, but each one typically still contains many datasets.

Think of a team doing ML, for example. They work with data all day across many different tools, each reading some inputs from S3 and writing outputs to S3. They won't create a bucket for every output, that's not practical. So they write to a single bucket with outputs organized under prefixes.

Buckets are more of an administrative boundary (IAM, cost, replication) than a data organization unit. So even with more buckets, the dataset abstraction is still missing - there's no good native way to track what a prefix represents, who created it, whether it's still accessed, how much it costs, etc.

It sounds like they're missing the concept of sub-buckets that a team member can easily create for a small project.