Hacker News new | ask | show | jobs
by stuntprogrammer 3884 days ago
I agree with many things you say. Take AWS S3 as an example: it reduces the problem by making objects immutable. Internally, they can cache aggressively without scaling fine-grained consistency on the objects as such. As an S3 user I can aggressively cache on the client too.

Now we've reduced the problem to consistency on the metadata structure which aggregates objects for the user. There are "well known" ways to do this for traditional trees. Other well known options include doing an all search based approach (i.e. always talk to a server for metadata, perhaps with local result caching), and so on.

1 comments

The last part you wrote is what I was getting at: you cannot eliminate distributed systems problems from distributed systems. You can only either (a) push them around into different parts of your design or (b) outsource them by using stuff like Amazon's DB-as-a-service stuff. Using immutable objects is (a): now you have a distributed metadata problem.

If they're doing (a) then it's just another Dropbox but with less tolerance for disconnected operation. If they're doing (b) they're just reselling and maybe with a nice client. That's not doomed as a business of course but it's not all that technically interesting.

If there's a (c) it's not in this article.

I strongly agree with you re 'conservation of distributed systems problems'. I'm being vague because I know what they're doing, but I'm limited in what I can comment on.

Their public comments are that they are running their own servers and not reselling other storage.

Personal opinion: no one can do (c) because of (a). That is, any possible (c) must tackle the fundamental hardness in the distributed systems problem, and this is what we agree (a) is doing. Using immutable objects as in S3 just shifts the problem elsewhere, while it reduces it, it doesn't solve it.

Immutability does reduce the problem by reducing the data footprint of stuff that needs to be synchronized.

So let me guess:

They're doing a distributed metadata store where the cloud is the tie-breaker combined with flexible caching of immutable content-addressable (identified with SHA512 or similar) objects.

In that case it sounds like cloud-hosted-only ipfs plus a nice client to access it from a host.

Cold? Warm? :)