Hacker News new | ask | show | jobs
by kgeist 83 days ago
I once had this silly idea to create distributed storage of arbitrary data by exploiting a range of completely unrelated sites. Say, when you want to upload your file to the System, it may store one encrypted chunk as an image on a free image hosting site, another chunk as an encoded blog post on a random forum about farming (or in the user profile?), another chunk as a youtube video, etc. Imagine having something like hundreds or thousands of such "backends". Every chunk would be stored in 3 places for high durability of course. Free storage, hidden in plain sight :) Although, I didn't think through how to store the index reliably, and, because a moderator on a random farmers' site may delete our record(s), there needs to be a system which continously validates the integrity and reuploads the chunks.

Maybe such a silly project already exists?

5 comments

You might enjoy reading through the original Google FS papers. I forget what they’re called but it addresses the durability problems.

Ah, I couldn’t remember the name because it’s literally named Google File System. https://static.googleusercontent.com/media/research.google.c...

I seem to remember bigtable also being interesting.

More than that, you might enjoy MIT’s distributed systems course. It’s all freely available online. I went through it for fun a decade ago or so, and it’s worthwhile for reasoning through hard problems like this.

People have definitely (ab)used YouTube as a filesystem though. And that’s probably your best bet for durability and performance.

I had the same idea!

Another silly (compression-based) idea I had was to:

- Index say google images, or something else with a large amount of URL -> data

- Find patterns in the indexed data that match patterns in your data, such that storing the URL and an offset into the data (or something more complex) would be smaller than the data chunk you are trying to compress

- Repeat for all chunks

- After you're done you can run it again and again. Infinite compression!

Yes the user has to download WAY more data that what they are trying to extract, and you'd need an insanely large index to be able to compress, but hey it was an idea.

There is a range header in the http specification for resuming downloads at a certain part of the file. Since http is stateless, you can download precisely what you need right away.
I've had this exact idea. Would need to be error encoded to account for chubks disappearing. There would be a rot rate as sites die or change.
You could write some custom backends for https://irmin.org/ I guess.

> Irmin is an OCaml library for building mergeable, branchable distributed data stores.

lol now I wanna build this. It's like the dark web but without user or in this case, site consent. This could be a fun few weekend project