| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bburshteyn 3909 days ago

Currently, no fail-over of any kind is implemented in Cryptomove.

If one or more servers go down, all parts that currently reside on those servers obviously remain there. When a down server comes back again, it restarts the movement of all parts that used to reside there before the failure.

This may hamper delivery of data parts upon restore request in case some parts reside on the down server. However, the parts of the saved file is always duplicated on the client before they are directed to the servers. Thus, if enough servers are still up, the restore request may still fetch copies that are still on the up servers, and which path back to the base also goes through the up servers.

Again, currently copies of the same data part travel independently and randomly. In the worst case scenario it may happen that all of them end up on the down server, or that for all of them the path back to their base server has a down server. This however, seems unlikely if there is enough copies and up servers.

Also, when a server decides to push a data part onto another server, it only does it onto a server that is up. All servers maintain keep-alive heartbeats with the members of their clusters, so they know which cluster servers are up and which are down. Of course, it may happen a server goes down in the middle of a data piece transmission. In this case, if it is the source server, it will restart transmission upon its own restart. If it is the target server, the source server will receive a timeout or an exception, and will re-transmit the same part later to an online server (might even be the same target server that went down in case it had come back again).