Hacker News new | ask | show | jobs
by lucasch 3635 days ago
We considered rsync but were wondering if there were more specialized tools available. We figured that those who work in the scientific community would have a way to transfer their large data sets between institutions.
2 comments

We transfer large files containing raw radar data, and moderate sized files contains databases of target movements and track information.

We use rsync.

When I worked for the local university we had to transfer data between machines to run experimental parallel programs on so-called "big data."

We used rsync.

Ahh ok thats good to hear. Have you considered using multipath transport protocols with something like rsync? I am curious if it could benefit this situation. MPTCP sounds like an interesting protocol if you control both hosts. https://www.multipath-tcp.org/
We were/are always restricted by intermediate limits on throughput, so it's never been useful or interesting to consider alternatives.

YMMV, but if you want to improve throughput, consider carefully where your data has to go through. But rsync is rock-solid, well-understood, mature, and just does exactly what it is intended to do.

If you have a high-bandwidth link and are in a hurry, use GridFTP (http://toolkit.globus.org/toolkit/docs/latest-stable/gridftp...), otherwise just use rsync.

Scientific institutions that need to transfer large data sets have fast connections. :) How does 340 Gbps sound? Check out ESnet. http://newscenter.lbl.gov/2014/10/20/does-high-speed-network...

I heard about ESnet while consulting at Lawrence Berkeley National Laboratory.