Hacker News new | ask | show | jobs
by Vinnl 2184 days ago
> Many natural languages, like German and Finnish, are so syntactically and morphologically complex that there is no compact ruleset that can describe them. (...)

> Additionally, not every sentence in a typical Wikipedia article can be easily represented in a machine-readable factual format.

It doesn't seem like the goal of this project is to describe those languages, or to represent ever sentence in a typical Wikipedia article? The goal doesn't seem to be to have all Wikipedia articles generated from Wikidata, but rather to have a couple of templates to the order of "if I have this data available about this type of Subject, generate this stub article about it". That would allow the smaller Wikipedia language editions to automatically generate many baseline articles that they might not currently have.

For example, the Dutch Wikipedia is one of the largest editions mainly because a large percentage of its articles were created by bots [1] that created a lot of articles on small towns ("x is a town in the municipality of y, founded in z. It is nearby m, n and o.") and obscure species of plants. This just seems like a more structured plan to apply that approach to many of the smaller Wikipedia's that may be missing a lot of basic articles and are thus not exposing many basic facts.

[1] https://en.wikipedia.org/wiki/Dutch_Wikipedia#Internet_bots