Hacker News new | ask | show | jobs
by femto 5196 days ago
Like others here, it's something I've been thinking about for a number of years.

This is an important project, with the potential to eclipse wikipedia, maybe even growing to be the saviour of free software? My reasoning follows.

Currently we program computers by giving them a set of instructions on how to achieve a goal. As computers grow more powerful, we will stop giving detailed instructions. Instead, we will write a general purpose deduction/inference engine, feed in a volume of raw data and let the computer derive the instructions it must follow to achieve the given goal.

There are two parts to such a system: the engine and the data. The engine is something that free software is capable of producing. The missing component is the data. The wikidata project is this missing component.

I'm convinced that Wolfram Alpha is a glimpse of this future: an engine coupled to a growing body of structured data. Wolfram's end game isn't taking over search, but taking over computer programming and ultimately reasoning. It's just that search is currently a tractable problem for Alpha, one that can pay the bills until it becomes more capable. There will come a day when Alpha is powerful enough to automatically translate natural language into structured data, at which point it will spider the Internet and its database and capabilities will grow explosively.

Free software needs Wikidata, to arrive at this endpoint first and avoid being made largely irrelevant by Alpha (or Google?)

4 comments

I think one problem is that it's really hard to do structured data in general. Projects that pick a specific domain tend to do it much better, because they have a more tractable problem, can build a community with domain expertise, etc., in ways that Wikipedia will have trouble matching unless they plan to collaborate with those projects and/or pull data from them. For example, I think a structured-data version of Wikipedia artist/album infoboxes is going to have a long way to go to catch up to http://musicbrainz.org/, which has a carefully thought out ontology and years of iteration on that specific problem. Alternatively you can try to do a carefully thought out, consistent schema for all metadata, but the Cyc project shows how hard that is.

I do think that by virtue of breadth Wikipedia's version may become the best data resource in niches that have no specialized structured-data project for them, and it may give other informal-schema, broad-coverage projects like ConceptNet a competitor.

"Free software needs Wikidata, to [] avoid being made largely irrelevant by Alpha"

Wolfram Alpha is already completely worthless because it doesn't cite the sources for any of its results. It's basically just a fancy search engine built on top of a garbage dump.

That isn't a list of references, that's just a list of suggested reading. In fact it's not even guaranteed that the any of the facts on that page come from any of those sources. It's basically just showing a list of books that come up when you Google for the question.
Interesting... so they're making the calculations internally but not telling you how they got there then, right? So you really can't use wolfram alpha as a reliable source for anything?
Correct. It's conceivable that you could find a secondary source in their reading list that links to a primary source, but in practice going through their list of sources would be much slower than just doing the search yourself, meaning that site has zero utility in practice. (Assuming you care about the information you're getting being true, if you're writing a middle school paper about penguins then it probably gives you enough plausible deniability for having done the work, but for anything else there isn't much point.)
This link here lists exactly what their sources for AstronomicalData are: http://reference.wolfram.com/mathematica/note/AstronomicalDa...
But it doesn't tell you which source a specific fact comes from. You can't verify it or check if there are more recent sources with better values--relevant if you need good accuracy for a specific value.
The way I read this it looks like you are placing the data part as more difficult? Although the system you speak about sounds like it is AI Complete. Figuring out how the human mind manages to maneuver combinatorial explosions in interesting search spaces is a very hard problem.

The field is very exciting but not without grave risks. I am of the opinion that The final key breakthrough(s) in Artificial Intelligence will be raced and not collaborated towards. The advantages the possessor of such a system would have would be enough to test the purest of saints. Also, Computational Ethics lags far behind even current primitive attempts at AGI. Furthermore, there are insentives to leave off the moral breaks since the consequences seem ephemeral, burdening your system with ethics would further increase the search space - doing the right thing is computationally harder than doing what is best for just yourself. The future: step carefully.

For what it is worth, you can merge large data sources with automatic program construction today. I recently started a project in this area. ConceptNet has an excellent API. Then look at Genetic Programming, Markov Logic Networks, inductive logic programming, each with its own strength and weaknesses. Program Transformation is a related area where it deduces programs from formal specifications that are unoptimized or non polynonmial in time or space. The most interesting take on this I have seen: http://www.cas.mcmaster.ca/~kahl/HOPS/ANIM/index.html.

I'm with you to an extent, but why do you think Wikidata in particular will be the missing component and not some other service like Freebase or DBPedia?
I don't really. Substitute any free body of structured data for Wikidata, or even view them as one body of data, which happens to be spread across multiple servers (and maybe requiring some translation for unification).