Hacker News new | ask | show | jobs
by Dylan16807 58 days ago
Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.

And, as a separate note, they shouldn't be balking at the amount of data in a virtualized onedrive or dropbox either considering the user could get a many-terabyte hard drive for significantly less money.

1 comments

> Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.

The moment you call read() (or fopen() or your favorite function), the download will be triggered. It's a hook sitting between you and the file. You can't ignore it.

The only way to bypass it is to remount it over rclone or something and use "ls" and "lsd" functions to query filenames. Otherwise it'll download, and it's how it's expected to work.

Why would it use either of those on all the files at once? It should only be opening enough files to fill the upload buffer.
I think you might be confusing Backblaze reading files and how Dropbox/OneDrive/Nextcloud/etc. work. NC doesn't enable this by default (I don't think), but Windows calls it virtual file support. There is no avoiding filling the upload buffer, because Backblaze has zero control over how Dropbox downloads files. When Backblaze requests that a file be opened and read, Windows will ask Dropbox or whatever to open the file for it, and to read it. How that is done is up to whatever handles the virtual files. To Backblaze, your Dropbox folder is a normal directory with all that that entails, so Backblaze thinks that it can just zip through the directory and it'll read data from disk, even though that isn't really what's happening. I had to exclude my Nextcloud directory from my Duplicati backups for precisely this reason -- my Nextcloud is hosted on my server, and Duplicati was sending it so many requests it would cause my server to start sending back error 500s.

And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).

No, I'm not confusing anything.

Dropbox and onedrive can handle backblaze zipping through and opening many files. The risk is getting too many gigabytes at once, but that shouldn't happen because backblaze should only open enough for immediate upload. If it does happen it's very easily fixed.

If it overloads nextcloud by hitting too many files too fast, that's a legitimate issue but it's not what OP was worried about.

The issue you’re missing is that the abstraction Dropbox/OneDrive/etc provide is not that of an NFS. When an application triggers the download of a file, it hydrates the file to the local file system and keeps it there. So if Backblaze triggers the download of a TB of files, it will consume a TB of local file system space (which may not even exist).
It won't keep it permanently. That would break under normal use.

Keeping recent files will work fine with a program that goes through them as fast as it can upload (which is not super fast).

Maybe it'll, maybe it won't, but it'll cycle all files in the drive and will stress everything from your cloud provider to Backblaze, incl. everything in between; software and hardware-wise.
That sounds very acceptable to get those files backed up.

It shouldn't stress things to spend a couple weeks relaying a terabyte in small chunks. The most likely strain is on my upload bandwidth and yeah that's the cost of cloud backup, more ISPs need to improve upload.

I mean, cycling a couple of terabytes of data over a 512GB drive is at least full 4 writes, which is too much for that kind of thing.

> more ISPs need to improve upload.

I was yelling the same things to the void for the longest time, then I had a brilliant idea of reading the technical specs of the technology coming to my home.

Lo and behold, the numbers I got were the technical limits of the technology that I had at home (PON for the time being), and going higher would need a very large and expensive rewiring with new hardware and technology.

4 writes out of what, 3000? For something you'll need to do once or twice ever? It's fine. You might not even eat your whole Drive Write Per Day quota for the upload duration, let alone the entire month.

> the technical limits of the technology that I had at home (PON for the time being)

Isn't that usually symmetrical? Is yours not?

Why would it do that more than once unless you are modifying 4TB of data every day, in which case you are causing the problem.