Hacker News new | ask | show | jobs
by walterbell 4036 days ago
Good point about university hosting and a clear boundary for copyright-cleared content. Archive.org has made progress here, although their discovery tools have much room for improvement. You would end up needing something like YouTube's "Content ID", since anyone can tweak a few words and resubmit content to create a new hash. This implies a need for distributed moderation, human+algorithmic. Much can be learned from the SEO industry, http://searchengineland.com/fooled-us-google-no-longer-annou...

On the topic of categories: once upon a time, dmoz was an open competitor to Yahoo, when hierarchical categories were used for discovery. It's now the Open Directory Project: http://rdf.dmoz.org/ . Google acquired the best-in-class open data graph FreeBase and later killed it. The data has since moved to Wikidata, which is the closest we have to an open version of the proprietary semantic graphs which have been constructed by Google, Facebook, Bing, Diffbot, etc. Some related ideas at http://openknowledgegraph.org/

Agreed that end-user deployment and usability need to be brainless and the system must be initially seeded with high-quality content. Usability is expensive and often specific to vertical market segments, hence would be best implemented by competing commercial applications which operate on the public, open, distributed data. But there would be the risk of commercial centralization (like Github/Gmail) of the open protocol.

It will be a challenge to orchestrate the boundaries between commercial software and open content, but this dynamic is arguably responsible for the construction of the entire Internet. We need to find balanced and stable boundaries between commerce & commons, because neither is going away.