Hacker News new | ask | show | jobs
by crazygringo 1139 days ago
This seems overly pessimistic to me.

Sure you're not going to use this as a consumer in place of a local disk, nor are you going to use this as part of your web app.

But there are lots of situations in reporting, batch/cron jobs, data processing, and general file administration where it's incredibly easier to use the file system interface than to use an HTTP API via a cloud storage library. Which FUSE is a godsend for. The latency doesn't matter in these cases for one-off things or scripts that already take seconds/minutes/hours anyways.

So no this isn't niche or a toy. It's a fantastic production tool for a lot of different common uses. It's not for everything but nothing is. Use the right tool for the job.

5 comments

In the old days, we had a system called NFS (Network File System) where, yes, you may decide to use only remote disks. There were several advantages apart from lowering the cost of disks, mainly that you could centrally manage boot images for a fleet of machines. Then we got the web and everyone seemed to assume you could do the same thing over the internet.

I agree with you, I would prefer a local disk to one with 100+ msec of latency and local storage prices are at the point where the right answer is probably "just add local storage."

But I watch with some sympathy the small army of sys-admins (something like 15-20 people) responsible for managing the 3000+ Macs our company uses and remember the 2 person staff which supported the 1500+ diskless workstations from my years at a sadly defunct mini-super-computer manufacturer. It was quite nice... you could go to any machine and log in and your desktop would follow you. I'm told doing the same thing with MSFT requires 10-20 people just to manage the AD hardware (though as a unix-fan, I hang out with other unix-fans who are notoriously rude to MSFT, so maybe it's only 5-10 people needed to manage the AD instance.)

Not old days. NFS is still widely used in the industry. In fact some of them cost millions of dollars for high end computer farms, e.g: isilon
I still use NFS in my home.
I do too. It just works. Though I boot off a local drive.
Applications for which filesystem-like access is important (i.e. requiring lots of POSIX file I/O system calls, e.g. read(2)/write(2)/lseek(2)) but latency is unimportant seem pretty niche to me. If you don't need any of the POSIX syscalls, it's not that much more difficult to work with bucket URLs vs. file paths — the general format is the same, i.e. slash-delimited file/directory hierarchies.
Not everything is a webserver. There's a lot of software out there that wouldn't expect files to exist anywhere else besides on disk, and it's not worth fetching them all from cloud storage before you begin working on the data. It's easier just to GCSFuse a bucket to a VM and let the user do what they will. Works great for ad-hoc analysis of poorly or unstructured data.
And for your use case, the latency is not a concern? I suppose that would be true if you were mostly dealing with really big files and only cared about reading large contiguous chunks of them, but I would consider this a fairly niche application.

In my use case, taking ~1 second each time to `ls` a directory, `stat` a file, or `lseek` within a file was simply unacceptable. This was on a cloud VM, so the latency would be at its absolute minimum.

In VFX a single texture can have terabytes..
The problem is that such systems have a habit of growing in scope until they reach a point where you really do need the more optimal access patterns of using the real HTTP APIs, and the inefficiencies of emulating the full filesystem API will gradually start to bite you. Maybe you’re lucky enough that that won’t happen, but it’s important to understand it for the quick hack job it is, IMO.
In most situations that time is years, decades, or ‘never’. Which is fine.

Not everyone or everything scales faster than bandwidth and/or CPU is.

I agree. For example if you want to use Google's ASR (Automated Speech Recognition), if your file is longer than 1 minute in duration, you first need to upload it to a bucket, which is a lot of added complexity compared to a regular HTTP POST.

Just copying the file to a mounted bucket would make this a lot easier.

Then again, how does one get the metadata of the uploaded file?

Calling any software system "niche" is kind of hilarious, as if, if it isn't postgres it's a massive failure. It's not supposed to be a high-performance cache of data.

My company uses GCSFuse for ad-hoc analysis/visualization of large but poorly structured output from our lifesciences jobs and it works just fine for that.