Hacker News new | ask | show | jobs
by lobster_johnson 3420 days ago
S3's API is so rudimentary that I prefer to think of it as a non-enumerable key/value store.

I learned this the hard way: We had an application where made the mistake of storing about a billion files in a nearly flat structure — one level of nesting, probably 100m "folders" in the root. Then one day we needed to go through it to prune stuff that was no longer in use. Unfortunately, if you don't have a "shardable" prefix, list requests are impossible to parallelize efficiently (because you can't subdivide the work), and our scripts took weeks to run to completion. Hard-earned experience: If you're storing large quantities of stuff in S3, always pick a shardable prefix. The upload date is a good choice. A random string will also do.

After this, my solution for any non-trivially-sized storage use case is to store an inventory of objects separately in a performant PostgreSQL database, and make sure all writes go through a service layer that shields the consumer from the details of S3. This has some benefits over a hypothetical centralized approach (but some downsides, like the possibility that things get out of sync if you sidestep the inventory). Overall, I wish S3 would store its metadata in something like BigQuery.

Anyone know if Google Cloud Platform's S3 equivalent, Cloud Storage, improves on these issues?

5 comments

Replying to myself: Disappointingly, it seems GCP's Cloud Storage is pretty much a carbon clone of S3 as far as the API is concerned, down to the prefix/delimiter-based search.
I wonder if "bucket notifications" are reliable enough that one could keep such an index DB populated automatically?
Yes, just hook those up to a lambda function and write to dynamodb or something
I tried this, but if you want to query by tags, using an RDS database works much better. DynamoDB is not well suited to this particular problem.
I think SQS would be reliable enough here, yes.
Have you looked into the inventory functionality? It was just added last November. http://docs.aws.amazon.com/AmazonS3/latest/dev/storage-inven...
Wow, you get a CSV file of all the objects. That's a solution I did not expect.

Sounds a bit like something they cooked up in a hurry to avoid having to design a BigQuery-type service for querying arbitrary metadata; I bet they had some huge customer with a need to get a CSV file for a bucket, that were willing to effectively bankroll the development of this feature.

But yes. That would sidestep the issue. You'd still have to turn on the feature and wait for the CSV file to build (apparently the best granularity is daily), of course, but it would help tremendously. Wish that had existed when we had our difficulties, about a year ago.

I did something similar storing the information in PostgreSQL but made the inserts/updates/deletes based on the events of s3. If an object was stored it would insert into the database. If it was deleted it would soft delete in the database. Worked out well for me.
As someone heading down a similar path (and I'm fairly sure I've got sensible prefixes) can you share an example of a prefix that caused you trouble. Is it something like

    /path/to/big-dir/«lots-of-sequential-filenames»

?
Exactly. It works fine for most tasks, of course, but if you ever want to process the contents of the S3 bucket in bulk, nothing will ever be able to parallelize that one list request to /path/to/big-dir.

If you don't use the evenly-distributed-prefix trick, your only chance of speeding it up is knowing the file names beforehand. If they're all sequentially numbered, you might do that, of course.

The shardable prefix doesn't need to be at the top level. So you could also organize it like so, for example:

    /secret/documents/2016-01-01/00000001.doc
Thanks! I've read the docs and blog posts, but it was interesting to see a real live antipattern.
I suppose it's like with regular file systems -- don't have too many files in a directory.

In your use case, consider `/path/to/big-dir/AA/AABB/AABBCC` or similar?

Sorry, that example wasn't my data, it was to illustrate question.