Hacker News new | ask | show | jobs
by stevenjgarner 1421 days ago
It seems the greatest value of Wikipedia is consistently as a repository of citations. The reliance of moderation or review of those citations is the question.

EDIT: I hear complaints from students all the time that they are not allowed to cite Wikipedia. I tell them no you should instead cite the Wikipedia citations. They invariably tell me how much better they do academically because of that.

6 comments

It’s really unfortunate that people copy claims made in Wikipedia (often without double-checking any other source) but then don’t cite Wikipedia. Often the claims made by Wikipedia are wrong, misleading, sloppy, one-sided, etc., and this (widespread) practice helps to perpetuate those problematic claims by making it seem that other authors are independently claiming the same thing. Then when future Wikipedia editors or others look for evidence of something, they find a number of sources that seem to corroborate the claim, but under close inspection turn out to a circular chain built on flimflam; unfortunately that close inspection often never happens.

Students should be encouraged to cite Wikipedia when they found information in Wikipedia, so that when they grow up and start writing real research papers they will continue citing Wikipedia when they find information there.

Finding information somewhere and then not citing it (or citing some random other source that actually says something different) erodes the whole academic project. Any teacher who tells their students not to cite Wikipedia should be ashamed.

Wikipedia is not a primary source. Students can cite the original source but not wikipedia itself. The university departments i'm aware of still don't allow citing wikipedia and explicitly cover it as bad style. And i have to agree with the professors there. The quality is plain bad, last time i checked a medical article on wikipedia it was full off of claims without citations and those claims contradicted official medical guidelines. If i would receive a academic home assignment without citations the student would have failed the course, so why should it be OK on wikipedia.
> The university departments i'm aware of still don't allow citing wikipedia and explicitly cover it as bad style.

It's worth noting that Wikipedia is not special in this regard — those same departments probably also consider it bad style to cite Britannica (and, if they don't, they should).

Encyclopaedias are meant to be starting points for research, not the ultimate destination. Editors, both of Wikipedia and otherwise, are not expected to be subject matter experts, which is why the guidance on Wikipedia is that you're not even supposed to use primary sources as reference, but rather secondary sources[0].

0. https://en.wikipedia.org/wiki/Wikipedia:No_original_research...

If students would always look carefully and critically at the relevant sources Wikipedia cites, assess their authors’ biases, compare multiple sources, etc., before ultimately picking what to cite, that would be one thing.

But what happens instead (in people’s published journal papers! not to mention news articles, etc.) is authors lazily crib material from Wikipedia and then either cite nothing or randomly pick works from among Wikipedia’s sources to cite without ever looking at them.

If you are writing a paper you should cite where you got the information. If the only place you looked was Wikipedia, that’s not great research practice but you should still cite Wikipedia. Honesty is an even more important part of scholarship than diligence.

> so that when they grow up and start writing real research papers they will continue citing Wikipedia when they find information there.

I guess it's fine to be idealistic here but most reviewers would look down upon your work if you do this. And that impression can be the difference between acceptance and rejection. I'm sure ideally this shouldn't happen, but it is what it is for now.

I think it's honesty rather than idealism. If someone took a shortcut by reading Wikipedia instead of research papers, it would be dishonest to try to hide that.

Of course, dishonesty often works, but it undermines the whole endeavor.

(Although, the original author still deserves credit for their work. Perhaps the citation should be to the original work "via Wikipedia".)

Well, a citation usually not about where you found something. It's more about trying to trace the origin about a particular fact or scientific contribution.
Why are the sources that people cite in Wikipedia not vulnerable to the same issues?
Some of them are, which is why it's important to look at the sources and verify for themselves. Citing a news article provides a very different level of evidence for an assertion than a peer reviewed paper.
Of course they are. Which is why every scholar should cite where they found their information, so that readers can chase down the reference and critically examine it. Authors should cite the source whether it is a blog post, a newspaper article, a popular textbook, a journal paper, a historical accounting document, or private correspondence with a colleague.

When people don’t cite their actual sources it becomes orders of magnitude more difficult to figure out how they came up with their claims and trace the origin and transmission of those claims through the literature.

It's difficult to impossible to impossible to actually fix those mistakes on Wikipedia. Its good theory only on paper without any change in real life.
It is entirely possible to fix those mistakes (one at a time) on Wikipedia.... if you do the research work to find out what happened.

But this takes significant effort (like, a half-day of research to sort out one claim), and then sometimes back and forth with other Wikipedians to convince people that you actually chased down the real story.

The problem is that for every mistake someone is willing to put effort into fixing, there are another 100 that nobody ever notices.

Not directing this comment at you, but to the other children of this thread:

So much criticism of wikipedia seems to come down to: wikipedia did X. I think X is wrong. Other people don't see it that way. I don't want to spend the time proving my point. How dare wikipedia not just take me, a random internet stranger, at my word.

All i want to know is how do y'all think it could possibly work differently? Everybody thinks they are right. Nobody intentionally is wrong. Obviously if you just show up, unwilling to explain why you are right or unwilling to accept compelling counter arguments to your point, its not going to go your way. Why would anybody think it would?

Some people are activists; most people are not.

Activists are willing to invest orders of magnitude more time, energy, and discomfort into winning. They are willing to break most social norms to have their narrative become the default. They're willing to suppress facts that would support alternate narratives. They're willing to put their thumb on the scale when inconvenient facts are unavoidable. Et cetera.

Non-activists are not willing to do any of those things.

It's not about right or wrong, it's about activism: who engages in it and how much.

But sure, let's let the activists win—or force everyone to become activists to "compete". I'm sure that'll make the world a better place.

----

Or we could ban activism since it is fundamentally anti-social bullying behavior. Maybe make a "code of conduct" that prohibits it. Just spitballing here…

It's pretty simple to identify activists mechanically (and at scale): they are in the fat part of the power law for contributions. Simply limit people's ability to contribute and et voilà !, the activism problem has been vastly reduced, if not eliminated. Non-activists now have a chance.

There are already various rules against engaging engaging in various types of bad faith behaviours. Like all disciplinary systems it is far from perfect of course, but it exists, and people get banned for behaving inapropriately every day.

Most people at the top of the power law are not evil people. Its difficult to be both prominent and evil. The real pov pushers tend to keep a lower peofile to avoid discovery. That doesn't mean prominent people dont have beliefs, everyone does, but most people can have beliefs and behave appropriately.

I think your real objection is its more difficult to argue with an experienced person who is willing to devote more time. Which is true. It is after all why in the real world expensive lawyers are worth the money.

But why is that a bad thing. If another person simply has researched the topic more than you, they should win the argument. That is life, the more effort you put in, the more likely you get a positive outcome.

If you really believe power users are more likely to behave in bad faith or maliciously, i'd like to see some proof, because i highly doubt its true.

I think that's an interesting point on activism. It's more than just individual activists though, there are also corporations/businesses - I can't think of the number of times I've heard people talk about hiring a team/contractor to revamp/clean up a corporate Wikipedia page, or an individual's Wikipedia page.
Not always. Any particular two or more wikipedians who decide they don't want something in can sometimes stop it.

For example, in Australia there is a body that does sport participation statistics, Ausplay. They do this every year. It's a great source for sport statistics on Australia.

Two wikipedians decided that these statistics were not permissible in the sport in Australia article. They won :

https://en.wikipedia.org/wiki/Talk:Sport_in_Australia#RFC_on...

This is sport in Australia, which is not that controversial. Now things that are controversial like IQ or the role of heritability in ability are surely going to be problematic.

If I’m reading this right, I’m seeing a vote taking place in the “Survey” section, with 5 votes invoked — 2 no, 3 yes.

And an end result of No Consensus.

Framing this as if 2 Wikipedians exercising outsized power to produce this ruling seems disingenuous at best. And their basic objection (I only bothered skimming) of bias and ambiguity in the source/data/methodology seems fairly reasonable on its face; whether it’s correct I have no idea but it’d be a reasonable objection

As a policy, this whole thing seems like good behavior; the only gap is in the lack of voting participants. I suppose it is a real problem if the vote can’t be recast when more people are willing, but otherwise

sien: If you think this decision was wrong you can try to bring it up again, and point out the discussion in Wikiprojects or other community pages where a larger number of Wikipedia editors will see it. If you could only get 3–4 people interested enough to discuss and nobody could come up with an acceptable compromise, that’s not really the fault of the process. If you think these particular editors did something improper, you can escalate to further community processes designed for dealing with various abuses.

The nature of any large project is that people will disagree and not everyone will get precisely what they want.

Some of the stats in the current article come from the predecessor of the AusPlay Survey.

Also, there are stats in the article that are completely biased that are self-reported stats from sports organisations.

The people objecting to new stats had no problem with these ones.

>Now things that are controversial like IQ or the role of heritability in ability are surely going to be problematic.

FYI this kind of "wrong think" is already being removed in many articles. The way it's removed is applying the existing deep and numerous rules more strictly to information which cuts against the current dominant cultural narratives. For one of the best examples I can provide, have a look at how the "Feminism" and "Men's Rights" pages are written. Completely different standards for evidence, commentary, style, and even sections. Criticism of men's rights is evident in the heading, while of course, there is no criticism of feminism in Feminism's heading.

Wikipedia co-founder Larry Sanger has described Wikipedia as "badly biased." He's 100% correct.

> Wikipedia co-founder Larry Sanger has described Wikipedia as "badly biased." He's 100% correct.

Larry Sanger is involved with several competitors, including some for-profit examples, so he has a financial incentive to bash wikipedia

Not to say that he is neccesarily wrong. I wouldn't say "badly biased", but nobody is going to claim wikipedia is perfect.

> Wikipedia co-founder Larry Sanger has described Wikipedia as "badly biased." He's 100% correct.

I was curious and I looked it up:

https://larrysanger.org/2020/05/wikipedia-is-badly-biased/

I find obvious errors all the time, but the pages are locked. This is enough of a barrier to stop me from trying to fix it.

Sometimes the discussion will have the same correction listed but overruled by partisan Wikipedians.

If you can’t be bothered to sign up for a free account, it’s unlikely you’ll do the (sometimes nontrivial) amount of research required to prove your case if you get in an edit war with another author.

You could equally well say “I find obvious errors in textbooks / lecture videos / journal articles / paper encyclopedias / ... all the time but it’s too hard to contact the author so I don’t do anything about it”.

The main difference is that in Wikipedia you can do something about it with some extra effort. So it’s actually a much better situation than most kinds of resources.

The pages that are “locked” are usually locked because they are spam magnets. Not allowing IP edits is unfortunate (and does discourage simple corrections to articles), but in the highest traffic parts of the site the work saved from not having to revert dozens of low-effort vandal posts is (at least arguably) worth the downside.

> overruled by partisan

You wouldn’t believe the amount of abject nonsense and spam that gets cleaned up by those “partisans”. But Wikipedia is an open project, the “partisans” here are just other (slightly more experienced) volunteers not in any way fundamentally different from yourself, and if you can convincingly prove your case via polite conversation you will win the argument (if there is a local dispute it’s generally possible to get more eyeballs on it by escalating to a broader group of volunteers).

* * *

P.S. someone named Slartibartfast turning down a chance to work on the real-life Hitchhiker’s Guide?

I love the idea, I just get shit canned every time I try. I have an interest in legal cases and have had my sources rejected when they are the SCOTUS official proceedings. Not for subjective claims, but obvious factual ones like who were the named defendants and who their lawyers were on a case. There are groups that like a not factual spin and I don't have the time in the day to go through Wikipedias adjudication system against someone and their possy.
I wrote about some interesting discrepancies within Wikipedia for my sporadic newsletter last year: https://dahosek.substack.com/p/something-interesting-eda
Interesting. None of the versions here are wrong exactly (i.e. in none of these cases were Wikipedians making things up or clearly misconstruing what they read), but sources out in the world that Wikipedia authors pulled their information from have incomplete stories, and sometimes one version contradicts the other.
This was such a fun article. Please write more!
This isn't generally true in my experience. Wikipedia values secondary sources, even outright wrong secondary sources, over primary sources. It often happens that all secondary sources are wrong (for example reporting on popular personalities like Elon Musk) and the very clear and obvious interpretation of the primary source is ignored, for all of the secondary sources that twist the words of what was said.
The point is that the relevant feedback is that Wikipedia is not a reliable source, not that you should not cite Wikipedia. They sound similar but they are fundamentally very different.

For former encourages the behaviour of finding other sources that are reliable. The latter encourages quoting Wikipedia without citing it.

Relevant xkcd https://xkcd.com/978/
It's so silly for teachers to say "you are not allowed to cite Wikipedia". It's a cart-before-horse approach, mindless rule-following. The principle in academic writing is that you cite whatever you used for your research. IF you read the Wikipedia article and base some of your conclusions on its synthesis, then you MUST cite Wikipedia, citing the sources of Wikipedia would be misleading IF you don't read those cited sources yourself. It's all about traceability and credit assigment. You can of course cite also what Wikipedia cites but you have to cite Wikipedia if you use claims from there that you didn't actually get from looking at the source cited by Wikipedia.

However! It's also important to teach students how sources differ in quality or "authoritativeness". The problem with citing Wikipedia is not the citing per se but relying on that source. A peer reviewed academic journal is considered more reliable, although no source should be taken as gospel and definitive truth, especially on controversial topics.

You can even cite blog posts, personal letters, and even personal oral communication! The point is to let the reader know where your info comes from. Making students memorize rules like "don't cite Wikipedia" just results in a cargo cult, not actual understanding of critical thinking related to sources.

If I understand the article correctly, it's not about incorrect information -- i.e. it's not claiming Wikipedia misleads judges because the articles are wrong -- but about "influence" of Wikipedia on decisions; an influence that could be gamed by an adversarial actor.

References themselves are gameable. They argue references mentioned in Wikipedia are more likely to be cited by a judge!

This is not about the quality of Wikipedia, but about its undue influence and how easy it is to game it, references included!

Judges would consult encyclopedias and similar resources 40 years ago.

The difference is that a crowdsourced resource like Wikipedia is easier to manipulate by people who understand the system. There are plenty of PR specialists who get client articles pushed into Wikipedia or updated to their liking.

Wikipedia is a treasure, but it’s also vulnerable to a bunch of different attacks.

Agreed. But that's not about the quality of the articles themselves, and certainly not about their references!
You're close to an important issue, I think.

False ideas can be spread simply by overemphasizing biased true statements and disregarding true statements that don't fit the narrative.

The effect can be multiplied by controlling the discussion through selecting the right 'questions' that are discussed.

Snopes is the exemplar.

What would be the best alternative to Wikipedia at getting a list of citations on a topic quickly? For academic research I consider the content of the page as "Yeah, maybe" but the citations to be more useful in terms of digging to deeper sources faster. The argument could be made that the ability of almost anyone to edit Wikipedia is a form of peer review but for edge topics it's tough to tell who has the chops to be editing a page and who doesn't.
> What would be the best alternative to Wikipedia at getting a list of citations on a topic quickly?

If you're looking for major/important sources to read on a topic, not just a quick way to halfway-fake a works-cited section, I've found it valuable to locate some representative, recent academic book in the field and read the author's introduction and other pre-chapter-1 material. These will often include a lot of name-dropping of what are considered major works in the field. There may also be a list of abbreviations the book will use, and those often include several major works in the field that'll come up often in the body text.

That's your list of books and papers to find and read. Repeat that technique with each of those books and papers, too, if you want to keep going deeper.

Often you can get enough off an Amazon or Google preview of a book for this to work. Plus, libraries exist, and you pull that kind of information out of several books (which can be handy—anything that appears more than once deserves special attention) in less than an hour, without checking anything out. And there's always Library Genesis, which may not have every book but probably has at least one in your interest area that can be mined in this way.

Wikipedia's sort of useful for this, at least for tracking down a first work to attack with this approach, but the problem is that many articles don't cite highly-regarded or authoritative or landmark works on the topic, so much as whatever the author(s) happened to have handy or what was easiest to find online (a whole hell of a lot of great information is still not available on the Web, even in 2022, including material in very recent books, not just pre-Web ones, or is available on the Web but only in poorly- or not-indexed-by-web-search-engines under-copyright ebooks).

The most useful tool is the academic citation graph, e.g. via Google Scholar.

Start with a couple keywords. Click through the "cited by n" links on the top few papers. For papers that don’t have PDFs freely available, find DOIs and put them into Sci-hub. Books can often be found at the Internet Archive, Google Books, or libgen. At the start, skim skim skim.

Look at what links forward and backward from the papers you see. Hunt for new keywords to try. Go a few hops all around the graph. It often doesn’t take too long to get a rough lay of the land.

Academic citation graphs are invaluable WHERE the topic is academic, but this post (like most citations I would venture to say) is an example of citing Wikipedia that would generally not be considered academic or in the results of the articles and case law of scholar.google.com.

There is a huge body of knowledge that lies (dare I say) in a Google search. You just need to know how to evaluate the search results with a reasonable criteria of notability, relevance, accuracy, etc.

Thanks for the reminder—I often forget about Google's various less-prominent tools and services.
Very well said, I'm writing a book (popular narrative non-fiction) and the research process has led me to the exact same conclusion. Find a couple "pinnacle" books related to the domain/subject/question you're interested in, and read the Preface, Introduction, Ch. 1, etc. This is often the only place in a book authors are candid enough to directly answer the question of "why am I writing this and how does it relate to what other's have done?"

No shade intended, if anything I need to work on this style. I'm too nervous of making it sound like other people's ideas are my own, but then I end up writing a block of defensive-sounding citation & qualification, and nobody wants to read that...

Anyways, well said.

Academic journal articles may be more convenient, because they usually focus on the key citations for a field in the brief literature review. Use Google Scholar to find some, then download from Sci-Hub.
Don't forget to search on https://news.ycombinator.com/ !
Usually you can find survey/review articles, or textbooks and look at what they cite. You can also try to find websites of research groups in the area with good reputations and check out their recent works and the citations therein. Especially works that you see multiple times using this approach will be important core works, from which you can go further. General Google searches and Google Scholar keyword searches and exploring the citation graph are also useful.
Wikipedia may be a good start if you're completely clueless about a topic. But, academically-speaking, it doesn't last long. You're not going to be able to produce anything more insightful than a fresher's last minute sunday night essay using wikipedia.
It depends on your field, but most fields publish review articles. These are written by a single author or two in the field, and perform no new analysis beyond just listing all the current info about the field. Here is an example for oncology, this one is periodically republished every few years as more info is learned by the field:

https://aacrjournals.org/cancerdiscovery/article/12/1/31/675...

Sadly I think that the value of Wikipedia in this lamentable example of judicial laziness and professional under funding is the web just happens to be more easily searched than LEXIS. Easily != competently.

Ed. competently replaced usefully

The citations on Wikipedia articles are often either broken, in a foreign language, or not up to an academic standard. However, following this practice is what eventually lead me to stop using Wikipedia for anything. When I tried to trace the citations to make my research easier, I found it actually made it a lot slower because I kept running into junk. Before I actually paid attention to the citations on a broad selection of articles, I was under the mistaken belief that Wikipedia was "good enough."
They have weird requirements for citations. Want to cite a link to a blog where a guy has spent 20,000 hours researching a topic but doesn’t have a degree? Not allowed. Want to post a link to a NYT article written by a journalist with no relevant experience in that same field? Go for it. It’s really strange. It’s like they’re aiming for narrative over fact or something.
To be fair usually when the nyt article is wrong they issue a retraction and when the blogger is wrong they triple down
Citation needed.