Hacker News new | ask | show | jobs
by einpoklum 1733 days ago
The performance claim seems fishy to me.

Within a single physical system ("multi-core multi-CPU" or with NVMe) - you rarely use something like rsync, zysnc or keysync: Your files are already local, on your system. If you want a second copy of a file on the same system, you would symlink or hard-link, or use other filesystem mechanisms. It's not even clear a clean copy would be slower than doing a bunch of comparisons.

On the other hand, between remote systems, the "modern day architecture features" mostly don't apply. I suppose a more clever use of modern kernels could help performance somewhat. Maybe.

2 comments

We should make a clarification... The intent of KySync (as well as Zsync) is to use across systems, not on a single system. KySync supports HTTP (like Zsync) as well as HTTPS (which Zsync does not).

The primary reason to do the performance comparison on a single system, is so that the results are easy to replicate with as little setup as possible. Because we do this for both KySync and Zsync it is apples to apples.

HTTP bandwidth and storage cost money, and this is self funded project, so I can't afford to put up test files of data publicly visible to the world.

One thing we can look into is leverage AWS/S3 to upload some data and use it for a performance experiment, but that will need some logistics for the developer to set their AWS account properly. Will look into it.

Of course, the more similar the files are, the closer the remote results will be to this first set we published.

So, are you claiming a 3x-10x performance improvement...

* Within the same system?

* Over close-by systems on a LAN segment?

* To typical far machines over the Internet?

I can see your point. We really need to run some experiments in S3 to satisfy this point rigorously.

Intuitively, the more similar the files are, the more 3x-10x+ will be representative in the real world. As the files become more and more different, one is of course at the mercy of the bandwidth between the computers. If you need to transfer 1 GiB differences over 1 MiB/s connection, it will take ~1000 seconds -- no magical way around it.

The comparison so far is practically on the computational cost of the sync, which can be significant as differences pile up.

Could be useful in the backend infrastructure stuff if you have beefy machines processing lots of IO. There could be myriads other use cases. Hell, some people have 10gbs machines at home. Even modern CPUs, definitely mobile, can struggle to hit line rate in certain applications like large file transferring unless you’ve got a beefier machine.