| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by walterbell 4031 days ago

1A: Wikipedia, github (e.g. awesome-*)

1B: named-data networking, camlistore, ipfs.io, bittorrent's web -- very easy to censor since all content is uniquely identified by a hash

1C: ?

1D: Usenet, Wikipedia, Wikidata

1E: technically difficult, see history of YaCY and http://juretriglav.si/an-open-distributed-search-engine-for-...

2: Wikipedia, but content is rejected under arcane citation policy

3: Wikipedia, but content is rejected under arcane citation policy

4: http://ipfs.io & https://crowdprocess.com/

5. blockchain publishing

6. standard challenge of bootstrapping user-generated content in any new online network

The role and funding of libraries have changed over time and across cultures, e.g. being associated with cities or nations, being associated with universities and churches. Current libraries are struggling to retain access to book stacks, e.g. look at the history of the successful community fight to prevent the flagship NYC library from removing many of its books.

1 comments

rilita 4031 days ago

I agree completely that many aspects are covered by existing technology. The difficulty is in making it such that all of this can exist as a simple set of open source binaries regular people can run ( not computer experts )

The system has to be brainless to use if it is going to succeed at all, and it has to have enough initial buy in to be worth bothering with initially.

Wikipedia as you have pointed out has arcane crazy policies that restrict it from having user generated content. It claims to only be a replication of printed articles. That is lies, but it is a pain in the butt to actually get valuable content into it and not have it removed.

Usenet suffers from not having structured data, being just chunks of text.

Github suffers from refusing to host binaries except for in weird cases ( there is a binary build hosting I think?? )

Bittorrent suffers from still being attached to hostnames for the most part ( I'm aware of the distributed system it has too but you typically can't get many seeds through it )

Most systems that allow content suffer from having copyrighted data on them. The goal of this would be for it to be publicly known that there is no copyrighted data, such that universities and such would be willing to run the distributed server, and censorship could be stopped by enough people running it globally.

I'm not focusing on funding of libraries so much as the fact that they have an established set of categories to put information into. There is no such standardized list of categories for websites to go into, and the creation of such is important to the future of the internet.

walterbell 4031 days ago

Good point about university hosting and a clear boundary for copyright-cleared content. Archive.org has made progress here, although their discovery tools have much room for improvement. You would end up needing something like YouTube's "Content ID", since anyone can tweak a few words and resubmit content to create a new hash. This implies a need for distributed moderation, human+algorithmic. Much can be learned from the SEO industry, http://searchengineland.com/fooled-us-google-no-longer-annou...

On the topic of categories: once upon a time, dmoz was an open competitor to Yahoo, when hierarchical categories were used for discovery. It's now the Open Directory Project: http://rdf.dmoz.org/ . Google acquired the best-in-class open data graph FreeBase and later killed it. The data has since moved to Wikidata, which is the closest we have to an open version of the proprietary semantic graphs which have been constructed by Google, Facebook, Bing, Diffbot, etc. Some related ideas at http://openknowledgegraph.org/

Agreed that end-user deployment and usability need to be brainless and the system must be initially seeded with high-quality content. Usability is expensive and often specific to vertical market segments, hence would be best implemented by competing commercial applications which operate on the public, open, distributed data. But there would be the risk of commercial centralization (like Github/Gmail) of the open protocol.

It will be a challenge to orchestrate the boundaries between commercial software and open content, but this dynamic is arguably responsible for the construction of the entire Internet. We need to find balanced and stable boundaries between commerce & commons, because neither is going away.