Hacker News new | ask | show | jobs
by wodenokoto 1589 days ago
I've never been in this situation, but I do wish you could query files with more advanced filters on these blob storage services.

- But why SageMaker?

- Why do some orgs choose to put almost everything in 1 buckets?

4 comments

>Why do some orgs choose to put almost everything in 1 buckets?

The article seems to be making the case it's because the delimiter makes it seem like there's a real hierarchy. So the ramifications of /bucket/1 /bucket/2 versus /bucket1/ /bucket2/ aren't well known until it's too late.

>So the ramifications of /bucket/1 /bucket/2 versus /bucket1/ /bucket2/ aren't well known until it's too late.

What's the difference?

In the choice between a single bucket with hierarchical paths versus multiple buckets, there's a long list of nuances between either strategy.

For the purposes of this article, you can probably have more intuitive, sensible lifecycle policies across multiple buckets than you can trying to set policies on specific paths within a single bucket. Something like "ShortLifeBucket" and "LongLifeBucket" would allow you to have items with similar prefixes (something like a "{bucket}/anApplication/file1.csv" in each bucket) that then have different lifecycle policies

There's a lack of searchable blogs and recommendations for how many buckets you need, and how much stuff belongs in one.

Got any recommended literature?

For many at orgs like this, SageMaker is probably the shortest path to an insane amount of compute with a python terminal.

Why single bucket? Once someone refers to a bucket as "the" bucket - it is how it will forever be.

> But why SageMaker?

You could ask the same thing of most times it gets used for ML stuff as well.

> Why do some orgs choose to put almost everything in 1 buckets?

Anecdote: ours does because we paid (Multinational Consulting Co)™ a couple of million to design our infra for us, and that's what the result was.

1 athena?

2 some jobs make a lot of data