Hacker News new | ask | show | jobs
by rilita 4031 days ago
There is a way to fix the problem that you are observing:

1. Create a "subweb". It is composed of the following

1A. A new specification for the publishing of information in complete form ( read structured content published together with templates to present them )

1B. Both server and client systems that implement the new specification over some sort of modified http but without using current DNS

1C. A free distributed un-filtered un-policed distributed DNS alternative

1D. A complete hierarchical index of all content existing in the new system

1E. A free distributed search engine allowing submission of content within the new system.

2. Make it clear the new system is for free content only

3. Disallow advertisements on it entirely

4. Tie in accurate hashing and allow people to dedicate as much hosting power as they are willing ( either to the content itself or index data to what hashes represent what )

5. Make it entirely secure with a public/private key system allowing people to establish meaningful identities within a system where their content is copied eternally.

6. Use the new system yourself to publish meaningful non-shit information and encourage your friends to do the same

By the way I really like your article and I think you are pointing out a crucial problem spot on that most people are oblivious to.

The easiest way to think of what must be done is just to look at what libraries are. Organizing information has been done by libraries for thousands of years and has a fairly established sensible process. ( which is being ignored by the internet )

1 comments

1A: Wikipedia, github (e.g. awesome-*)

1B: named-data networking, camlistore, ipfs.io, bittorrent's web -- very easy to censor since all content is uniquely identified by a hash

1C: ?

1D: Usenet, Wikipedia, Wikidata

1E: technically difficult, see history of YaCY and http://juretriglav.si/an-open-distributed-search-engine-for-...

2: Wikipedia, but content is rejected under arcane citation policy

3: Wikipedia, but content is rejected under arcane citation policy

4: http://ipfs.io & https://crowdprocess.com/

5. blockchain publishing

6. standard challenge of bootstrapping user-generated content in any new online network

The role and funding of libraries have changed over time and across cultures, e.g. being associated with cities or nations, being associated with universities and churches. Current libraries are struggling to retain access to book stacks, e.g. look at the history of the successful community fight to prevent the flagship NYC library from removing many of its books.

I agree completely that many aspects are covered by existing technology. The difficulty is in making it such that all of this can exist as a simple set of open source binaries regular people can run ( not computer experts )

The system has to be brainless to use if it is going to succeed at all, and it has to have enough initial buy in to be worth bothering with initially.

Wikipedia as you have pointed out has arcane crazy policies that restrict it from having user generated content. It claims to only be a replication of printed articles. That is lies, but it is a pain in the butt to actually get valuable content into it and not have it removed.

Usenet suffers from not having structured data, being just chunks of text.

Github suffers from refusing to host binaries except for in weird cases ( there is a binary build hosting I think?? )

Bittorrent suffers from still being attached to hostnames for the most part ( I'm aware of the distributed system it has too but you typically can't get many seeds through it )

Most systems that allow content suffer from having copyrighted data on them. The goal of this would be for it to be publicly known that there is no copyrighted data, such that universities and such would be willing to run the distributed server, and censorship could be stopped by enough people running it globally.

I'm not focusing on funding of libraries so much as the fact that they have an established set of categories to put information into. There is no such standardized list of categories for websites to go into, and the creation of such is important to the future of the internet.

Good point about university hosting and a clear boundary for copyright-cleared content. Archive.org has made progress here, although their discovery tools have much room for improvement. You would end up needing something like YouTube's "Content ID", since anyone can tweak a few words and resubmit content to create a new hash. This implies a need for distributed moderation, human+algorithmic. Much can be learned from the SEO industry, http://searchengineland.com/fooled-us-google-no-longer-annou...

On the topic of categories: once upon a time, dmoz was an open competitor to Yahoo, when hierarchical categories were used for discovery. It's now the Open Directory Project: http://rdf.dmoz.org/ . Google acquired the best-in-class open data graph FreeBase and later killed it. The data has since moved to Wikidata, which is the closest we have to an open version of the proprietary semantic graphs which have been constructed by Google, Facebook, Bing, Diffbot, etc. Some related ideas at http://openknowledgegraph.org/

Agreed that end-user deployment and usability need to be brainless and the system must be initially seeded with high-quality content. Usability is expensive and often specific to vertical market segments, hence would be best implemented by competing commercial applications which operate on the public, open, distributed data. But there would be the risk of commercial centralization (like Github/Gmail) of the open protocol.

It will be a challenge to orchestrate the boundaries between commercial software and open content, but this dynamic is arguably responsible for the construction of the entire Internet. We need to find balanced and stable boundaries between commerce & commons, because neither is going away.