Hacker News new | ask | show | jobs
by kot-behemoth 3421 days ago
While great points, I think it might then go beyond the "Simple" in the S3 name itself. Wasn't the original purpose of the service to have it as a dumb storage, and you'll layer metadata as required? I.e. storing indices separately with whatever functionality is needed (be it date/path filtering).
3 comments

Perhaps true.

I'll never do that though because I'd have to use DynamoDB, which is a technology that is high on my list of "technologies that I am least enthused about".

Also, I really shouldn't have to go to all the work of creating and maintaining a metadata database and implementing a query API just because I want to do searches more powerful than "list all objects" - that's Amazon's job.

Moreover, even if you had gone to the trouble of building such an API, S3 still doesn't offer bulk operations, so you'd have to operate on each matching object... one object at a time.
This isn't such an issue because you can update the DynamoDB index using an AWS Lambda function on every putobject or removeobject event.

It's still not something I want to do, mainly because I'd have to touch DynamoDB but secondly because, well, why the heck doesn't AWS do it?

What do you have against DynamoDB?
Surely some extra functionality would not obfuscate the inherent simplicity of S3.

A an S3Query module would not, I think, make things harder for S3 users.

And frankly - it would be awesome.

I used s3 a lot, and loathe to switch to a DB if I can avoid it.

Some querying and indexing features I think would be taken up by a large number of devs.

This, we love S3. What we did is add a SQL tier for some of the data we are storing there in case we want to do some more structured operations.
Yes, but are you sure your database matches the underlying data store?

The real problem with building a metadata index outside is that you then have the synchronization validation - yuk.

You can always do a full scan of your S3 namespace every week or so and synchronize the index. This gives your consumers low latency access to the object store, as index lookups are extremely fast, it minimizes the cost of lookup events on S3.
So my database is up to a week wrong? Errr.....
in that it stores undeleted files, until weekly clean-up

the DB is only incomplete for as long it takes to commit to the SQL layer after storing successfully.