Hacker News new | ask | show | jobs
by skissane 2262 days ago
In my personal experience, Wikidata is often worse at detecting vandalism than Wikipedia. Wikipedia has more editors and so vandalism on Wikipedia tends to be noticed sooner. Wikidata gets less attention so vandalism can endure for much longer.

With the increasing trend to pull data from Wikidata into Wikipedia, this is I think becoming less of an issue – even if nobody is watching the Wikidata item, if some vandalised property is exposed in a Wikipedia infobox, that increases the odds that someone will notice the vandalism. However, there are always going to be more obscure items which lack Wikipedia articles, and more obscure properties which don't get displayed in any infobox, and for them the risk of vandalism is greater. (Plus, it is possible for a Wikipedia article to override the data in Wikidata with its own values; this is done for the English Wikipedia Sci-Hub article, for example – Wikidata is including all the historical web addresses, Wikipedia only wants to display the current ones – I don't think it is technically possible yet to filter out just the current ones, so instead Wikipedia is manually overriding the addresses from Wikidata.)

1 comments

> I don't think it is technically possible yet to filter out just the current ones, so instead Wikipedia is manually overriding the addresses from Wikidata.

Note that the historical ones are of “normal” rank, whereas the current ones have “preferred” rank. You can filter that when using the API, and when using the SPARQL endpoint, if you go for the “truthy triples” representation `wdt:P856` of the “official website” property, you will only get best-ran statements – in this case the preferred ones. If you want to be absolutely sure, you can go for the “reified triples” representation and query for statements that don't have any “end time” qualifiers.