Hacker News new | ask | show | jobs
by fkn 5237 days ago
Can anyone explain the following bit: "They’re cleanly partitioned into 20 million data separate data sets, one per user."

Does it mean they have a database per user? That can't be right is it?

5 comments

It means that the data from one user has no relations to the data from another user. So most if not all their queries only query the data of one user.

This is really useful for things like sharding, where you can split a database table onto more than one machine, because there will be few queries that will stall fetching data from one machine to another.

Why not have a database for each user? Evernote's data is partitioned perfectly for that. Notebooks and notes are accessible to one user or are public. There is no sharing notes between users.
There is sharing notes between users though - I have several shared notebooks, each holding shared notes.
Actually, his idea of a database for each user could still work, even though there is sharing of data between users. Take each database, and turn it into an executable object, which reads/writes its own data, and which communicates with other objects for sharing. It's like taking the actor model of computation, and orienting it for database use. I don't know of any working example of where this has been done, but I don't see why it wouldn't be feasible.

http://en.wikipedia.org/wiki/Actor_model

Their SQL is executed per user, so they only touch around 1/20M the size of the database for any request.
Not likely, but they can very easily have a database holding all users with id starting with 'a'
That's a really bad method of sharding. Names do not distribute equally over the alphabet.