Hacker News new | ask | show | jobs
by simonw 243 days ago
I'm surprised this story didn't mention the scandal with Scots Wikipedia: https://www.theguardian.com/uk-news/2020/aug/26/shock-an-aw-...

> an American teenager – who does not speak Scots, the language of Robert Burns – has been revealed as responsible for almost half of the entries on the Scots language version of Wikipedia

It wasn't malicious either, it was someone who started editing Wikipedia at 12 and naively failed to recognise the damage they were doing.

2 comments

The Cebuano wiki is a similar case, not spoken often, but it was a personal project of an editor that was mad at political articles and started making animal articles in the Cebuano wiki.

The solution is to differentiate and tag inputs and outputs, such that outputs can't be fed as inputs recursively. Funnily enough, wikipedia's sourcing policy does this perfectly, not only are sources the input and page content is just an output, but page content is a tertiary source, and sources by policy should be secondary (and sometimes primary) sources, so the system is even protected against cross tertiary source pollution (say an encyclopedia feeding off wikipedia and viceversa).

It is only when articles posing as secondary sources fail to cite wikipedia that a recursive quality loss can occur, see [[citogenesis]]

Many sources for Wikipedia articles refer to Wikipedia without citing it. Many journalists will work from Wikipedia, and most of Wikipedia's sources are journalistic articles. It happens to be that often this isn't noticed because the information obtained this way is true and uncontroversial. Citogenesis only documents examples where, by bad luck, the result is untrue information.
No citogenesis is present regardless of whether the information is "true". The concept of "truth" on wikipedia doesn't have much weight, mainly because it would be impractical, out of scope and original research to determine truthfulness.
> It is only when articles posing as secondary sources fail to cite wikipedia that a recursive quality loss can occur

I've seen a college professor cite wikipedia in support of a false claim. On investigation, the text in wikipedia was cited to an earlier blog post by that same professor.

I wasn't convinced.

I don't think it's entirely illegitimate.

1- citing wikipedia (or any tertiary source) is valid, the problem is just when the source is not cited. And also it's against wikipedia policy, but you are free to cite it elsewhere.

2- citing the tertiary source and citing the secondary source are distinct and valid. There is no "rule", in wikipedia or otherwise, that says you need to cite the underlying source. In fact citation chains can become quite deep, it would be very impractical. An example would be, you could cite the gospels when jesus talks with the devil. If we had it your way then you wouldn't be able to cite an apostle, you would have to attribute the quote to jesus, and furthermore if jesus quoted the old testament you would have to cite that? If you think the bible is an exception, consider case law, if you were to cite an attorney's defense and the attorney cited some cases, would you have to cite the original cases? If so, then which? There might be multiple, it's not just a citation chain but a graph.

In this specific example your professor was not just quoting himself, but his work is now part of wikipedia and importantly was not contested or was not successfully contested. Similarly to how a trademark works, you claim you own the trademark, and if a year or so no one contends it, you have a stronger case that it's yours.

The background here is that Scots is not really a language. Try asking a Glasgow taxi driver who addresses you in 'Scots' whether he knows any English. Robert Burns wrote in English, with some of his spelling reflecting pronunciation in the Scottish English dialect.

The people who want it to be considered as a language for political reasons cannot be bothered to translate Wikipedia themselves. They read and edit English Wikipedia and understand it perfectly.

Sort of?

The Glaswegian taxi driver may not consider themself to be speaking a different language but, if speaking to another local and leaving aside pronunciation, they’d use words, phrases and even grammar that’s incomprehensible to someone with no experience with Scots.

I’m a “posh Scot”, raised middle class in Edinburgh so my accent is minimal and thickens up or softens depending on who I’m speaking to. Even for me, there’s a lot of words, phrases and ways of speaking I’ve had to adjust to be consistently understood by American coworkers when over the last 10+ years.

Brits do the same. At best it is a dialect at worst an accent. A lot of (most of) Scots is still English but spoken with different grammar or unfamiliar phrases and unfamiliar pronunciation.

Sort of like extreme cockney rhyming slang or for a more modern example thick BME* full of slang.

* = British Multicultural English, think fam n blud, lots of Jamaican english influence plus south east asian influence.

> The background here is that Scots is not really a language.

This is supremely ignorant. Scots is its own language. It's a 'brother' or 'sister' of English, with both English and Scots being descendants of West Germanic languages.

The fact that many (all?) Scots speakers also speak English doesn't mean Scots not a language on its own.

You could make your exact same arguments that Irish isn't a language because you could ask a Cork taxi driver whether he knows any English.

Scots = a language with some of the same ancestors as English.

Scottish English = a dialect (and accent) of English

Scots Gaelic = another language, with the same ancestors as Irish and Manx.

Australians, Jamaicans, African Americans and English-speaking South Africans do not have their own Wikipedia, despite all these dialects having more legitimate demographic and linguistic claims to being languages than 'Scots'.

James Joyce wrote in English, no Irish person pretends that he wrote in a third language distinct from English and Irish. The fact that they do not do so does not compromise the political basis for independence, republicanism or reunification.

If a Cork taxi driver, addressed you in Irish (very unlikely), and you asked him to speak English, the request would be both coherent and reasonable. The point you missed is that the Glasgow taxi driver would look at you with consternation and say "But, I am speaking English! What's wrong with my English?' (insert dialect spelling if you like)

Rabbie Burns wrote in the same language as his compatriots Louis Stevenson and Scott.

It would be ignorant if I did not know about the meretricious claim of a minority of Scottish people to have their own language, but it is not ignorant to reject that claim. I am Scottish fwiw.

Scots is somewhat partially intelligible in written form to English speakers, but that does not make it the same language as English. You might as well say that Spanish and Portuguese are the same language.
You might as well say that US English and Canadian English are different languages.

Geordie English is closer to Edinburgh 'Scots' than to RP English or US English or Indian English. Is it a dialect of Scots?

There's also a smooth language continuum between Spanish and Portuguese, with varieties like Galego. This doesn't make them the same language. Historically the language continuum encompassed most of Europe, but people at the extremes would've had no expectation of understanding each other's language.
What counts as a language is almost always determined by "political reasons" - as the witticism goes: "A language is a dialect with an army and navy."

There exists dialects that are less mutually intelligible than apparently distinct languages, and the designation of each as "dialect" or "language" is political. Language is often a proxy for culture, and political actors may wish to suppress or boost the legitimacy of such cultural expression depending on their aims.