Hacker News new | ask | show | jobs
by halo 5196 days ago
tl;dr: spin-off Wikipedia infoboxes into a seperate project with an API, and then use that data to bootstrap an open data project with broader goals.

In theory, it's a good idea. It takes an existing useful data source and puts in a form that encourages reuse, and since it solves the bootstrapping problem then it's not obviously doomed to failure like the Semantic Web.

I see two potential downsides.

My first concern is that, in practice, it will make editing Wikipedia more complex. There's no inherent reason why this should be the case, but there's no inherent reason why Wikimedia Commons should make editing Wikipedia more complex either, yet it undeniably does.

Secondly, it will prevent a similar source of data from appearing with broader terms of use. For example, OpenLibrary is public domain.

2 comments

Is it even possible to have a database of factual content under CC-BY-SA? This is part of the reason OpenStreetMap is moving to ODbL.

Somewhat ironically , since part of the reason is that you can't copyright facts, they didn't just take the existing data under the same theory, but asked everyone to accept the new licence. I wonder what Wikipedia plan to do?

I don't see why you couldn't have a database of facts under CC-BY-SA. You can't copyright individual facts, but you absolutely can copyright a collection of facts as a collection. [1]

I would think the more-pressing problem would be the 'viral' nature of the 'share alike' restriction when it came to API use.

Attribution would also seem to be thorny and difficult to police, but not intractable.

[1] e.g. I can make a phone directory and copyright it. You could take all the data out of my phone directory to make your own directory and that would be fine. But you could not simply make copies of my directory and sell those as your own.

But being able to legally take all the data out and making your own database (or other thing) with it (which you state is fine) is exactly what makes CC-BY-SA pointless/inapplicable to databases of open data.

See this discussion of why CC-BY-SA is unsuitable for OpenStreetMap (which mentions the case law on phone books you refer to):

http://www.osmfoundation.org/wiki/License/Why_CC_BY-SA_is_Un...

Wikipedia says this on Fiest vs Rural and collections of facts:

"In regard to collections of facts, O'Connor states that copyright can only apply to the creative aspects of collection: the creative choice of what data to include or exclude, the order and style in which the information is presented, etc., but not on the information itself. If Feist were to take the directory and rearrange them it would destroy the copyright owned in the data.

The court ruled that Rural's directory was nothing more than an alphabetic list of all subscribers to its service, which it was required to compile under law, and that no creative expression was involved. The fact that Rural spent considerable time and money collecting the data was irrelevant to copyright law, and Rural's copyright claim was dismissed."

http://en.wikipedia.org/wiki/Feist_v._Rural

It seems to me the confusion is over what OpenStreetMap wants to control and what copyright allows them to control.

The 'shortcomings' of CC-BY-SA noted in your first link seem to boil down to use cases involving chunks of data that simply do not qualify for copyright. Thus, by definition, no copyright license could behave any differently than any other in determining what can and can't be done with those chunks of data.

A Terms of Use agreement (and enforcement) could do more, but the particular copyright license is simply moot.

The ODbL isn't (just) a copyright licence, for exactly those reasons.
What editing interface could possibly be more complex than the current system of Infobox "markup"? If Wikidata does nothing besides make it easier to edit those infoboxen, it will be a success.