Hacker News new | ask | show | jobs
by madez 2597 days ago
When I see that articles about the same thing in different languages state different and incompatible numbers, then I always think that this is an obvious and easily solveable problem; separate the data from the language and reference it in the text. That way each language uses the same data. This shouldn't just be done with numbers, but with dates, and other easily convertible information.

Yes, this will create conflicts as to which information is correct, but Wikipedia has had this problem since forever, and deals with it.

3 comments

This is currently being done on Wikidata for information that is more easily represented in a database.

https://www.wikidata.org

https://en.wikipedia.org/wiki/Wikidata

Some infoboxes on Wikipedia have values that are automatically synced to the related Wikidata entry.

https://en.wikipedia.org/wiki/Infobox

The problem is that most data can never be verified. A source may never be fully accurate. A source could be a bunch of BS in the worst case. Even government data and media-based data frequently contradict each others.

For recent events we have already seen large press groups spreading misinformation. So when do you know when they actually produce facts or produce biased BS? At the end of the day somebody makes a call and we know nothing of their affiliations.

Wikipedia also has a notorious problem with "sources" that are written by the same person that's editing the article.
That's a horrible idea.

1. The surrounding text depends on the number it contains. By blindly replacing numbers in every language version, you get garbage like "Town A is the most populated place on the region at 10,000 inhabitants, followed by Town B at 11,000."

2. If you look at multiple language versions and they disagree, you know one of them must be incorrect and you should watch out for bias and outdated information. Forcing them to all have the same data takes away that feature.

3. Bias is rarely a problem with objectively checkable data. When the PRC publishes an encyclopedia, the issue with it is not that they'd get the date of Tiananmen Square wrong.

4. Requiring people to use and read some sort of placeholders instead of ordinary text greatly increases the barrier to entry.