Hacker News new | ask | show | jobs
by olliej 817 days ago
In your proposed scheme of URL->hash, everyone is expected to pay a big service to host their data, so that their page is accessible to that service's users? Or what? It sounds like you're saying "use IPFS for web hosting" but that's famously slow and unreliable for anything not incredibly popular.

How do you propose a person publishes their data? They get a domain, they create their page, they hash their page, they point that domain at that hash, and then what exactly? They update their content, compute the new hash, and then wait for DNS to propagate the new change?

I'm serious here, I'm trying to work out how what you're suggesting would work for the standard use cases of websites, and I'm trying to work out how if it does work for the standard use cases it solves any of the problems that archive.org has to manage, or any of the features people use archive.org for?

People don't use archive.org to ask "where did this content get published?" (e.g I have a hash of the page content) they say "what was the content at this location+time?". Definitionally they do not have the content, so they do not have the hash. They have the location, but you've just said the location is just the hash of the content. If you're saying the location of the document is the full url, not just the domain (the only part that involves machine addresses), then what is the hash for?

Finally, if the location is based on hash, you don't only break any content that is not 100% static, you break encrypted content, because definitionally encrypted content is not static.

1 comments

You're still thinking in terms of paying for hosting, ie a commercial web, you cannot solve this problem by everything remaining a commercial entity, since payment for services rendered must cease at some point; you're just trying to bandaid over the problem.
Systems like you're describing, exist, and the fundamental problem is that they are not even semi-permanent archives.

For something to exist, someone has to host it, and the way you get something hosted is to pay for it (either paying someone to be a host, or paying for hardware and connectivity). Once you stop paying those costs you're reliant on other people choosing to keep your data around, just as archive.org does. If no one chooses to, the fact that you had a pile of random hashes scattered into the resource naming/identification scheme does not matter. Sure nodes would cache commonly accessed data, but the moment it stops being frequently used it starts getting pushed out of those caches to hold the new popular stuff. If you are hosting it yourself, or paying someone else to host it, once it drops off the "being popular" wagon its persistence is limited to whenever the next cache flush occurs.

So in exchange for content being harder to update, the routing performance being lower, making cryptography impossible, not working for dynamic content, and making censorship much easier, you have not solved the problem that archive.org already attempts to solve. Nothing in your scheme would obviate the need for scraping and separately archiving, nothing ensures content remains once no one is paying to ensure hosting.

> You're still thinking in terms of paying for hosting, ie a commercial web

Hosting and serving content has a cost though.