Hacker News new | ask | show | jobs
by manuletroll 5196 days ago
This might be very interesting if it's implemented in a sane way. Unfortunately there doesn't seem to be a very widely-adopted standard in the world of open data for now..
4 comments

What does it mean "a very widely-adopted standard in the world of open data"? "standard" of what?

There are meta-format standards: XML, RDF, HTML and lately JSON. With these four you are probably covering 80% of the world published open data, the rest is PDF, MS DOC and MS XLS.

That is missing, and good like filling this void, is a single format that you can use to describe everything. Personally, I think that such a single format will never exist and looking for one is pointless. Geographical data requires attention to certain details, music data to others; this means two different formats must be used (serialized through XML, RDF, HTML, whatever). If you are thinking about "bridging" different formats and data models, then, welcome to the world of RDF/S, OWL, TopicMaps ontologies (or ontologY), I'm not sure you want to live there :)

This new Wikidata, just like Freebase, is trying to collect structured or semi-structured data instead of unstructured data such as that present in Wikipedia. I am happy about the aim (completely unstructured data is basically useless for any serious data reuse and data extraction) but my fear is that they will not succeed as well as they did with Wikipedia. Wikipedia funded its success on the fact that anybody could edit it. In order to edit a wikipedia page you only need very low technical skills and basic writing skills (plus knowledge of the topic, obviously). Adding and manipulating structured data requires people to obey to a certain mental grid, to a formalized model, to a schema developed by someone and put in place to be respected strictly. The vast majority of people is easily demotivated when they are required to learn something substantial beforehand and most of the edits of unskilled users end up removed by watchdog (something seen often in high quality Wikipedia articles: edits made by new users are quickly reverted on the grounds that they did not follow some of the many guidelines that must be followed).

My idea is that many problems found in structured-data projects (FreeBase, MusicBrainz...) could be alleviated by better interfaces and a wide use of automation, both things that Wikipedia projects do not seem to excel in.

RDF has been adopted by some pretty big data websites, and apparently that's one of the formats they plan to support:

    The data will be exported in different formats, especially RDF, SKOS, and JSON.
http://meta.wikimedia.org/wiki/Wikidata/Technical_proposal
Technically unsound: RDF is a relationship model and a meta-model (think XML Infoset), SKOS is a vocabulary (think XHTML) and JSON is a serialization format (think XML or RDF/N3).

The question is which schema, ontology or vocabulary will they use to express their data? Who will develop it? Or will they reuse other vocabularies? How do they intend to extend them? If they are RDF based, how will they project to JSON given that there are a dozen different conversion methods?

How can that document not cite DBpedia, a project that is extracting structured data from Wikipedia infoboxes and has years of experience in doing that?

The fact that their technical proposal document is quite confused about these ground technologies makes me fear that there is more wishful thinking than past experiences.

I found the mix of different kinds of technologies odd too, but I assumed it's just a draft, not the final spec.
I think whatever they choose to implement it in has a good chance at becoming the next de facto standard.
Does the standard really matter? If it's machine understandable, it should be able to be automatically translated into any other format in the future.

The important thing is to jump in and make a start. The right way of doing things will become evident as the project evolves.