| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gioele 5196 days ago

What does it mean "a very widely-adopted standard in the world of open data"? "standard" of what?

There are meta-format standards: XML, RDF, HTML and lately JSON. With these four you are probably covering 80% of the world published open data, the rest is PDF, MS DOC and MS XLS.

That is missing, and good like filling this void, is a single format that you can use to describe everything. Personally, I think that such a single format will never exist and looking for one is pointless. Geographical data requires attention to certain details, music data to others; this means two different formats must be used (serialized through XML, RDF, HTML, whatever). If you are thinking about "bridging" different formats and data models, then, welcome to the world of RDF/S, OWL, TopicMaps ontologies (or ontologY), I'm not sure you want to live there :)

This new Wikidata, just like Freebase, is trying to collect structured or semi-structured data instead of unstructured data such as that present in Wikipedia. I am happy about the aim (completely unstructured data is basically useless for any serious data reuse and data extraction) but my fear is that they will not succeed as well as they did with Wikipedia. Wikipedia funded its success on the fact that anybody could edit it. In order to edit a wikipedia page you only need very low technical skills and basic writing skills (plus knowledge of the topic, obviously). Adding and manipulating structured data requires people to obey to a certain mental grid, to a formalized model, to a schema developed by someone and put in place to be respected strictly. The vast majority of people is easily demotivated when they are required to learn something substantial beforehand and most of the edits of unskilled users end up removed by watchdog (something seen often in high quality Wikipedia articles: edits made by new users are quickly reverted on the grounds that they did not follow some of the many guidelines that must be followed).

My idea is that many problems found in structured-data projects (FreeBase, MusicBrainz...) could be alleviated by better interfaces and a wide use of automation, both things that Wikipedia projects do not seem to excel in.