Hacker News new | ask | show | jobs
by captn3m0 2261 days ago
Maybe use WikiData? The slower rate of updates might work in your favour to avoid vandalism.
3 comments

In my personal experience, Wikidata is often worse at detecting vandalism than Wikipedia. Wikipedia has more editors and so vandalism on Wikipedia tends to be noticed sooner. Wikidata gets less attention so vandalism can endure for much longer.

With the increasing trend to pull data from Wikidata into Wikipedia, this is I think becoming less of an issue – even if nobody is watching the Wikidata item, if some vandalised property is exposed in a Wikipedia infobox, that increases the odds that someone will notice the vandalism. However, there are always going to be more obscure items which lack Wikipedia articles, and more obscure properties which don't get displayed in any infobox, and for them the risk of vandalism is greater. (Plus, it is possible for a Wikipedia article to override the data in Wikidata with its own values; this is done for the English Wikipedia Sci-Hub article, for example – Wikidata is including all the historical web addresses, Wikipedia only wants to display the current ones – I don't think it is technically possible yet to filter out just the current ones, so instead Wikipedia is manually overriding the addresses from Wikidata.)

> I don't think it is technically possible yet to filter out just the current ones, so instead Wikipedia is manually overriding the addresses from Wikidata.

Note that the historical ones are of “normal” rank, whereas the current ones have “preferred” rank. You can filter that when using the API, and when using the SPARQL endpoint, if you go for the “truthy triples” representation `wdt:P856` of the “official website” property, you will only get best-ran statements – in this case the preferred ones. If you want to be absolutely sure, you can go for the “reified triples” representation and query for statements that don't have any “end time” qualifiers.

It also shows all the historical links. Sample: https://www.wikidata.org/wiki/Q21980377
I was going to say "The Wikipedia page uses the data from Wikidata", as I thought I had seen that in the past. Turns out that it's not the case, and after picking a few samples, it looks like Wikidata is barely put in use in Wikipedia (aside from the inter-wiki links).
A lot of Wikipedia templates now pull data from Wikidata – https://en.wikipedia.org/wiki/Category:Templates_using_data_...

In the case of the Sci-Hub article, it actually would pull the website address from Wikidata, except that the article has been configured to override the Wikidata website address data with its own. Heaps of articles do this – https://en.wikipedia.org/wiki/Category:Official_website_diff... – but I understand the aim is to try to reduce the number of those cases over time.

Yeah, via the overriding is how I came to the conclusion that it's not being used. I looked at the pages (specifically the infoboxes) and basically all values shown in the article were spelled out in the source. At the same time no Wikidata ID was present in the source (though I now realize that the template can automatically query that based on the page it's being used on).

> A lot of Wikipedia templates now pull data from Wikidata

It looks okay for the English wiki, but the others seem to trail behind quite a lot (though obviously number of best templates isn't a perfect metric).

English: 539 German: 128 Chinese: 93 French: 52 Swedish: 37 Spanish: 19

Given that only ~2200 English Wikipedia articles have their infobox completely from Wikidata (https://en.wikipedia.org/wiki/Category:Articles_with_infobox...), it looks like Wikidata integration still has a long long way to go.