| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by papaf 5951 days ago
	Two things surprise me about this article - probably because I've misunderstood it and don't see the big picture. One is that there are master and slave databases and searches are done off the master - I've always seen them done off the slaves in other systems. The other is that they state that using MD5 doesn't allow for horizontal scaling. One of the qualities of MD5 is that all bits have an equal probability of being 0/1. Surely the last 1 or 2 bits can be used to indicate which server is holding the data?

2 comments

smanek 5951 days ago

Searches are likely done off slaves - I suspect that is not presented properly because of the oversimplification of the diagram.

You can just use a few bits from an MD5 hash to decide server as long as you know how many servers you're going to have up front. The problem is that if you later wanted to add or remove a server, you would need to come up with a new scheme and move every piece of data around so it's on the right server (which would take days/weeks).

The more scalable/flexible solution is to use a consistent hashing algorithm (check out some of the papers on Chord) so that adding or removing a server doesn't require you to move as much data around.

jedberg 5951 days ago

The search machine is its own database. It feeds its data from the masters for consistency, but the searches themselves run against the search database.

I think the MD5 thing was covered well below.