Hacker News new | ask | show | jobs
by derefr 3832 days ago
Think of the Datomic architecture[1]: some nodes are "storage", while other nodes are "transactors." The transactor nodes pull "chunks" of rows/objects/documents from storage nodes, compute relational indexes locally, answer queries from those computed indexes, and finally persist computed index "chunks" back to the storage nodes.

Now, note that the "canonical" storage that gets read from doesn't have to be the same storage that the indexes get written to. The first can be owned by the user, while the second can be owned by Facebook.

Presuming an architecture like this, the latency and availability of the user's "Facebook account" database is relatively immaterial. While writes would have to be synchronous (so, like you said, Facebook would have to give the user a "sorry, your account is unavailable" error), reads could be asynchronous. Think of an online RSS feed reader service: the "primary sources" are the third-party sites with their RSS feeds. Sometimes those sites go down. When they do, the reader-service can't retrieve the feed–so that feed just goes stale.

Things like Facebook's graph, meanwhile, are fundamentally indexes. The base-level "documents" in the graph are relationship assertions—a copy of "B accepted A's friend request" stored in B's database. The graph is a computed value built on a pile of those. When Facebook can't reach someone's database, things like these relationship assertions just go stale.

The crucial idea, here, is that for Facebook to do its job, it probably has to cache a majority of the stuff in the user's database in one form or another—just like an RSS reader caches RSS feeds. But this is purely a cache, in a fundamental sense. Users who don't "check in" with Facebook could be cache-evicted from its database. Other users would still have relationships with them and be able to post on their wall and such (they'd be putting those documents in their own outbox); Facebook would just no longer bother computing anything that's personal to the user, like their news feed. There would be every incentive to set up the architecture such that user data that wasn't needed would be "garbage collected" off of Facebook's servers, because it could always get put back on, the moment that user's account-instance woke up again and said hello.

This would also mean that Facebook wouldn't need to store any of their user-data in anything resembling a relational normal form. Every table would be a "view" table. The canonical database, owned by the user, could be relational and full of nice constraints and triggers (and the user could even add these themselves); but since Facebook can just query out of that to get any data it's missing, it wouldn't need anything like a "users" table. (Fascinatingly, if Facebook was built as a microservice architecture, this means that each microservice would probably separately query the canonical data from the database in order to generate its own indexes; the Search service would know one "face" of you while the Photos service would know quite another. These could even—in theory—be separately ACLed within your own DB instance, giving the user true, actual control over what Facebook can do with their data, component by component.)

[1] http://docs.datomic.com/architecture.html