Hacker News new | ask | show | jobs
by mellosouls 1301 days ago
One of the weaker points of Wikipedia is the way editors lazily refer to journalistic "reliable" sources as authoritative when a significant amount of the time the journalist is lazily using WP as their own source.

Just a couple of these circular references build up a circumstantial base of "evidence" which is then itself used to bolster the original weak claims which often reflect nothing more than an editorial or journalistic assumption or bias.

11 comments

When I was bored in school I ran down the original source for the claim on the Honey Badger wikipedia page that they can take several bullets and keep coming.

It turns out that the source was a book, which was citing a news story, which was about a letter a farmer had written in. I recall tagging this in the backed with 'weak source' (or something).

Checking back now, the claim has been changed to: "The only sure way of killing them quickly is through a blow to the skull with a club or a shot to the head with a gun, as their skin is almost impervious to arrows and spears."

Which is more plausible a claim than the original, and I'm absolutely prepared to believe killing a Honey Badger with arrows is hard, but the source still isn't actually proving that.

Anyway it was a useful lesson for younger me about Wikipedia.

Experienced wikipedia editors are often pretty good at catching this circular sourcing on important articles, but, yeah, it's definitely much more of a problem for more obscure things like the history of toasters.

Years ago I noticed that the article for the mimic octopus said that it mimicked various things, including the "venomous sole" (!) with a citation that looked suspiciously similar to the wikipedia article. Of course, there's no such thing, but if you search the web you can still find articles based on the wikipedia article claiming it (the wikipedia article itself was eventually corrected to the (non-venomous) zebra sole).

For over a decade, wikipedia credited some guy with inventing a special type of blimp, based on some ill-researched news article. About a year ago the article was checked and removed, and then that guy actually invented a blimp, but now wikipedia is refusing to reinstate the article.
Do you have a link to the Wikipedia discussion of the refusal, or even the name of the inventor?
I mean, I think imaginary special blimps fit into the same space as non-existent fish; it's exactly the sort of thing that Wikipedia's defense mechanisms don't work for, largely due to lack of interest.

(I do find the concept of a venomous sole fascinating, though. How would that work? Would it _bite_ people? They're seabed-dwelling flatfish!)

Could have venemous spines on the pectoral fins. If I were going to be a venemous flatfish, that's how I'd roll.
A few years ago, for a book, I ended up researching the history of steel/tin cans among other things. There were a bunch of online resources, including Wikipedia, that all parroted essentially the same storyline. However, a couple of older books I found suggested the history was actually a fair bit older and which struck me, at any rate, as more likely.

It was a minor point and I wasn't going to dig deeper but it does show how at least (possibly) simplified/incomplete narratives drive out more complicated histories.

To be fair, nobody has a better venomous sole impression than the mimic octopus.
Not only WP: An increasing amount of articles just cite "sources say" or "an unnamed U.S. intelligence official says".

Then the entire world press copies the article within hours. Even Reuters and AP have adopted the practice.

Then Wikipedia cites the narrative as the truth.

> An increasing amount of articles just cite "sources say" or "an unnamed U.S. intelligence official says"

Hold on, when is it you think that newspapers did _not_ do this?

In general, if a paper is writing about anything vaguely contentious, it will use unnamed sources; if it names its sources then its sources won't stay sources for long, and the media will become little more than a system for regurgitating press releases. It is always worked this way; this isn't new.

(There was a fun bit in "Yes Minister" where the minister, while leaking something, was offended that the journalist wanted to use "sources" instead of "sources close to the Prime Minister" to attribute his leak...)

I remember learning about firsthand and secondary sources. Wikipedia is not a primary encyclopedia, its a secondary source.

Just like journalism is full of individuals paid off or paid to say (or not) something (sometimes blantantly untrue), wikipedia has plenty of inaccurate or false information.

So, it used to be I would use wikipedia as a type of third hand source - find information on a topic, then dig through it's references, then list Wikipedia as a source that I had used, but never quote Wikipedia.

The problem being there are few encyclopedic sources on the internet to begin with, so when the articles start sourcing the article aggregator, then the quality is bound to go down the tubes as well.

> Wikipedia is not a primary encyclopedia, its a secondary source.

Technically, Wikipedia is a tertiary source. A "third hand source", as you say.

A secondary source is something like e.g. a book written by a historian based on research they did translating and putting together some primary-source ancient writings. The secondary source can be considered semi-authoritative on a topic; but is still interpreted through a lens. You can quote a secondary source, though always with attribution. What you cannot do, is to state claims from a secondary source as [cited] fact (like you can with a primary source.)

Because of the "anyone can edit" part, though, Wikipedia has no authoritative-ness to it — nobody is standing behind and vouching for the validity of any given text that exists on Wikipedia; there's nobody to take responsibility for the inaccuracy of a statement, nobody's professional scholarly or journalistic or critical reputation is on the line. So it's not valid to quote Wikipedia even as a secondary-source "attributed fact." So it's not a valid secondary source. Thus, tertiary.

Mind you, it's also bad even for a tertiary source. A dictionary is a tertiary source, but writers on grammar like https://www.grammarphobia.com/ might still cite historical editions of dictionaries to prove extant historical understanding of a meaning of a term. Not to take the dictionary as authoritative, but just to, effectively, "do cultural anthropology to it" — seeing the dictionary as a well-known work of writing at the time, that can be analyzed for its word choices regardless of who wrote it. In theory, you could cite [a particular point-in-time snapshot of a page from] Wikipedia for similar reasons; but it's one of those rare exceptions where you have to really know what you're doing. You might say that it's challenging to cite Wikipedia, both in a technical sense, and in the sense of doing so being the best thing to do.

This tertiary source concept you just invented it, it does not exist.
It's a shame that stuff like that is (at least in my experience) not taught in school. Years of lurking on Twitter have taught me about background-talks, "sources close to XYZ", how leaks, PR releases, and whistleblowing work [0] - and it really adds to correctly judging the stories you read. Of course, it requires quite a bit of trust in the media you consume.

[0] and submarines, of course ;)

> Not only WP: An increasing amount of articles just cite "sources say" or "an unnamed U.S. intelligence official says".

Giving indications about the source is important for the reader to evaluate its seriousness. Not giving too much information is important to keep your source long term, and to get others (nobody is going to talk to a journalist know to expose their sources). Of course this depends on journalists being reasonably truthful otherwise the whole thing has no value. This is why reputation is critical and serious journals sack their journalists when they find out they lied.

What is the alternative? It is also very easy to make up a quote, attributing it to someone who’s never said anything like it or just make up a name.

The problem people run into when contesting such information, is that wikipedia doesn't consider Primary Sources as legitimate, but cite secondary sources all day, wether libellous or not.
I just fixed a bug like that in Wikipedia: atmospheric methane lifetime is widely quoted as being "a half life of 9.1 years". That appears in the wikipedia, with a reference to a massive document (part of IPCC's AR5) which in fact makes no such claim (about a half life I mean). But once it's entered the Zeitgeist, can the false assumption be eradicated?

(I only noticed this because the 9.1 y estimate may no longer be valid, and we are putting together a paper on the subject.)

Especially when the hoaxer is adding those circular references themselves on purpose.

> "These [claims] would get picked up in different types of media, I would cite them, and they would become fact," Alex says.

Wikipedia implicitly (or maybe explicitly if you read the policies) relies on sources that have gatekeepers, however unknowledgeable/biased/cursory those gatekeepers are in practice. And it probably also favors print publication even though that means readers (and editors) often can't practically verify the information for themselves.
I think ultimately the problem is that Wikipedia assumes journalists should be treated by default with a degree of respect above “professional gossip monkeys”.
Fifteen years ago, someone poorly read the 1999 Guiness book of records and wrongly concluded that Guiness called the Game Boy Camera the world's smallest digital camera. Aside from the fact that Guiness World Records are a pile of crap and untrusthworthy, it made no such claim. To the contrary, it claimed other digital cameras as the smallest.

Now think, how many time have you heard that fact when reading retrospectives on Game Boy, or watching Youtube videos?

IOW, both Wikipedia and Journalism can be a lot like Santa Claus — there is a lot of "evidence" of his existence, it is just that exactly zero of it is good, i.e., grounded in objective reality. But it makes a nice story, towards which most people seem to gravitate.
There is no realistic better alternative which does not install a de facto Ministry of Truth.
Possibly true, but its important that users are fully aware that the "Ministry of Truth" operates at times on Wikipedia; not everybody is.
> the "Ministry of Truth" operates at times on Wikipedia

Please elaborate. Or, even better, link to examples.

Well, this whole discussion and article is about misinformation and "truth", and I'm not sure how I could add to my initial comment at the top of this thread, but - assuming you allow, as I intend - the "Ministry of Truth" is not necessarily a malevolent, intentional global entity, but rather an occasional artefact of the factors previously described by myself and others - bias, circular reasoning, gatekeeping, etc then there will always be a significant chance that it will naturally arise here and there where those influences are scaled by - say - demographic imbalances (eg. white, male, liberal, etc) in the sector in question (in this case some Wikipedia subjects; but it applies equally to real-world contexts where you might see other demographic imbalances, eg. academia, the police, army, public sector, startup culture, etc, etc).

For Wikipedia's own various discussions on the subject if you want to delve further, try:

https://en.wikipedia.org/wiki/Criticism_of_Wikipedia

https://en.wikipedia.org/wiki/Wikipedia:Systemic_bias

https://en.wikipedia.org/wiki/Ideological_bias_on_Wikipedia

Remember, none of these problems are necessarily intentional, it's just not as simple as implying Wikipedia is a special, neutral case free of those issues.

An “occasional artefact of factors [like] bias, circular reasoning, gatekeeping, etc” is not even close to anything which anyone could reasonably call a “Ministry of Truth”, and it is certainly not what I was referring to.
If that artefact presents as keeping certain subjects or articles within the bias of the dominant group, it is pretty much controlling "truth" in that way. But, yes, I don't think Wikipedia has a shadowy board of Cigarette Smoking Men or anything.

But there are other entities that seem to do a reasonably good job of being impartial (e.g. BBC), I just don't buy the idea that Wikipedia is very special (or especially effective) in regard to neutrality.

In another front page discussion is what looks like a emerging consensus that grand larceny like the FTX theft is a function of the non existent fact checking of the media.

What possible motivation could exist for establishing a online reference using the media as the canonical and sole source of truth?

> What possible motivation could exist for establishing a online reference using the media as the canonical and sole source of truth?

Because "the media" is... the best source we have?

To be clear, Wikipedia doesn't require that you cite mainstream media sources only. You can cite anything that's a primary source as fact (whether that be a work of investigative journalism, a book, a letter, a blog post by someone involved, a study described in a journal paper, etc); and anything that's a secondary source as an attributed quote (whether that be a work of editorial journalism, a magazine article, a blog post by someone who isn't involved, a meta-analysis described in a journal paper, etc.)

That's actually a very low bar. For example, people who are discouraged from "original research" on Wikipedia, can simply stick said original research onto a website they own, and then edit Wikipedia to cite that, and that's 100% allowed. (It's disincentivized to promote your own investigative reporting or quote your own words on Wikipedia, but if you did it all "by the book", nobody's going to revert the edit.)

In all cases, the only real requirement is that everything Wikipedia says has to be be attributable via citation to something, somewhere, that exists in the public sphere of semi-permanent accessibility, such that a reader could reasonably be expected to be able to fact-check the citation qua citation by "chasing the pointer" to its referent. So you can't cite a person (as a person won't necessarily give you the same answer twice); but you can cite an interview with said person recorded at a specific time and put into some form of public record (e.g. a court proceeding.)

Actually, to correct this: it's highly recommended by Wikipedia to use secondary sources where possible, not primary, precisely for this reason.

"Original research" is exactly the opposite of what you mean, original research in the context of Wikipedia generally means citing primary sources. The "research" that is original is the interpretation of the raw data (the primary sources). The preferred approach is to cite an expert's interpretation of the data or event.

You can disagree whether this makes sense, or whether most articles follow this, but this is Wikipedia's policy:

> Wikipedia articles should be based on reliable, published secondary sources, and to a lesser extent, on tertiary sources and primary sources.

> Secondary or tertiary sources are needed to establish the topic's notability and avoid novel interpretations of primary sources. All analyses and interpretive or synthetic claims about primary sources must be referenced to a secondary or tertiary source and must not be an original analysis of the primary-source material by Wikipedia editors. [1]

1. https://en.wikipedia.org/wiki/Wikipedia:No_original_research...

This applies to editorial statements (opinions, interpretations); but if you're just quoting blunt facts, a primary source is preferable, no? E.g. if you're citing a statistic, it's better to cite the study that supplied the statistic, than to cite a science-journalistic review of said study. You can then cite some statement the review made to put the statistic in context; but that should be a separate, second citation, so that the statistic itself can be grounded in a primary source — especially since science-journalism does not often cite the primary sources itself, making it hard to chase the citation for the statistic otherwise.

Or, maybe to couch this in language more friendly to how a Wikipedia editor might think of the process: if you are already providing an attributed quote of a secondary source; and in the secondary source, a fact is quoted from some primary source without a true citation being provided (only a weak, implicit-in-context mention of the source); is it not better to do the "original research" of figuring out what primary source the secondary source got the data for the claim from, and then putting in a citation for the fact inside the quote, yourself, in exactly the way the secondary source's author likely would have if the format they were publishing in allowed true citation? "Repairing" the citation graph, so to speak, where the secondary source has left a gap in it.

(And this is important to the topic at hand: citing only secondary sources for "blunt facts", where those secondary sources are not themselves expected/required to provide citations to primary sources, is exactly how hoax citation graphs arise.)

>but if you're just quoting blunt facts, a primary source is preferable, no?

At least in theory, no it's not, per Wikipedia policy. Now, in practice, if you reference some government dataset for a non-controversial fact like the area of a state, only the most procedural whackjob admin is going to flag it. But, in general, you're not supposed to use primary sources.