Hacker News new | ask | show | jobs
by ketralnis 3295 days ago
I'm not too offended by letting the administrator figure out the best way to do an uncommon operation. As a point of comparison, Cassandra does have a proper way to bootstrap new nodes but I've found that in many cases it's better to short circuit it and rsync the initial data myself (and use its repair functionality to clean up the mess).

Some reasons include throttling load on the "old" servers, better feedback on progress, the ability to pause/resume, or even being able to do it faster than the DBMS can e.g. by snapshotting the disk on the source machine and making a CoW clone of it. Heck, if you're running your own hardware and feeling a little reckless, pull out one of the drives from the source machine's RAID mirror and you've already got a full clone right there.

I guess you could build all of that into the DBMS, but it's a rather specialised manual operation that's not happening all that often and it's one of the cases that the administrator almost certainly does know better

1 comments

How does that work? As near as I can figure, you need to have all the sstable files from all nodes in a rack on disk. Most will be discarded on "nodetool cleanup", but I would expect it to have to rewrite all the files due to the new token range.
> you need to have all the sstable files from all nodes in a rack on disk

If you're not using vnodes, then you need all of the sstables from the previous $RF nodes in the ring. So with RF==3, it will briefly have about treble the amount that it will finally carry.

It's a lot of temporarily wasted disk space for sure, but now you're in full control of how you get the data there

This sounds like a terrible idea.