This makes me sad. Wikipedia is surely one of humanity's greatest accomplishments to date. Having such a resource be off limits to a large section of our population is truly depressing.
Of course, there was always the language barrier, but zh.wikipedia.org could definitely have been as high quality as the English version given the chance.
Is zh.wikipedia.org not as high-quality as the English version in your experience? I don't use it that frequently, but for topics like famous Chinese people or linguistics of Sinitic languages, the English article is often just a stub, while the Chinese version has all the information I'm looking for.
In my experience zh.wikipedia.org is largely authored by Taiwanese residents. The quality is high but local narratives are often written with some bias not entirely unlike what one would expect from a displaced population resulting from a fairly recent civil war.
When I see that articles about the same thing in different languages state different and incompatible numbers, then I always think that this is an obvious and easily solveable problem; separate the data from the language and reference it in the text. That way each language uses the same data. This shouldn't just be done with numbers, but with dates, and other easily convertible information.
Yes, this will create conflicts as to which information is correct, but Wikipedia has had this problem since forever, and deals with it.
The problem is that most data can never be verified. A source may never be fully accurate. A source could be a bunch of BS in the worst case. Even government data and media-based data frequently contradict each others.
For recent events we have already seen large press groups spreading misinformation. So when do you know when they actually produce facts or produce biased BS? At the end of the day somebody makes a call and we know nothing of their affiliations.
1. The surrounding text depends on the number it contains. By blindly replacing numbers in every language version, you get garbage like "Town A is the most populated place on the region at 10,000 inhabitants, followed by Town B at 11,000."
2. If you look at multiple language versions and they disagree, you know one of them must be incorrect and you should watch out for bias and outdated information. Forcing them to all have the same data takes away that feature.
3. Bias is rarely a problem with objectively checkable data. When the PRC publishes an encyclopedia, the issue with it is not that they'd get the date of Tiananmen Square wrong.
4. Requiring people to use and read some sort of placeholders instead of ordinary text greatly increases the barrier to entry.
I only use Wikipedia as a reference guideline for specific subjects I am not knowledgeable in. It does a fairly good job at providing that kind of information.
Very few Taiwanese today associate themselves with mainland. Less than 10% percent of the population were refugees two decades ago, and it is even less today, with second generation mainlanders being more or less assimilated.
It's not just zh / en problem. Probably it's every language problem.
Out of fun I compare some Russian vs English articles on controversial topics (e.g. Stalin / Nicholas II / etc). It's really interesting to see how it's different. In most cases it matches to the traditional point of view in native speaker community.
On the whole I would say the English version is better. In general articles about China tend to be pretty good (modern and ancient history, people as you pointed out) and can be richer than their English equivalents. I actually haven't found the linguistics articles about Sinitic languages to be all that helpful. My impression is that they usually have less information than their English equivalents and often have sections translated from their English counterparts (I'm fairly sure the translation is in that direction because the English is more detailed than the Chinese and the original sources are usually in English), but I suppose YMMV.
However as you get into the various sciences, the quality and coverage of articles drops off very quickly.
I also find that the tone of the articles can also veer off a little bit from the usual encyclopedia-detached tone of Wikipedia (at least somewhat more so than the English version)
I still prefer it to Baidu's online encyclopedia, but it's not as excellent as the English version.
As the artical mentions, the Chinese version has been blocked since 2016 - shortly after Wikipedia made the decision to move to mandatory HTTPS via HSTS. I'm not sure how recent this is but it seems that the site is also on the HSTS preload list used by most browsers - even within China (it seems Chrome is very popular there). It seems they have just gotten arround to blocking all of the other languages this year.
ESNI seems like it would work pretty well. Although the PRC firewall could likely be used to block ESNI and/or TLS 1.3 and force plain-text SNI.
Since Wikipedia can be dumped onto a memory card I am not sure how effective the ban is supposed to be. It will constrain its spread but it is very hard to block all means of transmission of information.
"it is very hard to block all means of transmission of information."
I think this is something to understand about modern political information manipulation/restriction.
The 1.0 version totalitarian censorship aimed for full information control. This is what the soviets and pre-1990s CCP tried to do. It's difficult because (as you say) information is hard to control.
The 2.0 version is about enough control. You can use a vpn (or memory stick), but most won't. This gives one version of events a major advantage over the others. Easy-to-find praise for the government, difficult-to-find dissent.It's about dominant influence, not absolute control. When needed, regimes can temporarily increase control, like erdogan did during the last turkish coup attempt.
A soft paralel is social media "bubbles." They don't "control" the information you can access, but they are enough to influence your opinions in a direction. There are lots of exceptions, but on average, these have a big influence on who we think the good and bad guys are.
How many were already using Wikipedia in the first place? I assume that before the ban happened it was already pretty much hidden from results in Baidu and the like?
You are right but no need for a memory card. Hundreds of millions of Chinese tourists go aboard[1] every year. Most of them travel within Asia so most HN readers don't feel it. Let alone VPN that most educated people can use if they really want to.
I still remember that last year during my internship in Mainland China, clone git repository from GitHub and GitLab is a pain and my professor even have to use VPN too. China is basically hell for developers. I am afraid the new generations in China will no longer know about the real Internet. *sigh
I am quite curious about you experience. From my own experience download speed from GitHub is always acceptable. Whoever cares about Internet in English in Chinese will have their way to cross the GFW. You may search shadowsocks on GitHub and see how well developed the technology has became. BTW, internet service in most top universities in China can cross GFW through ip6.
Interesting comment. You make it sound like the real internet happens to be US-centric. Yes, many important projects are hosted on Gitlab and Github, but there are also other ways to share code. And these alternatives might not be that well known to us developers in the western world.
I hope you are not a wumao. The salient thing about github and other similar resources is not that they are developed by people in the US and owned by a US company. People from all over the world use it. There is nothing intrinsically 'US' about the content itself.
On the other hand, the GFW excluding much content from outside China makes the China Internet experience particularly Chinese and not the "real Internet".
For a software developer (or student thereof), the useful parts of the internet are indeed quite US-centric.
Off the top of my head, losing access to any of the following platforms will hamper your professional development: Github, Stack Overflow, cloud services leaders (AWS, GCP, MS Azure, etc.), Coursera, Udacity, Youtube, Khan Academy, Google Search, Google groups, Slack. At best, you'll have to use a VPN, or maybe resort to a domestically developed and probably inferior alternative. At worst, no such alternative exists.
Udacity has a local Chinese version. Slack works, kind of. Google groups is a shuffling zombie abandonware project outside China so I fail to see how the lack of access within China will impact anything there.
I listed those platforms (some of which still work in China) just to illustrate that a lot of developer resources are indeed based in, or at least developed in, the USA.
PRC have proven repeatedly that they can and will arbitrarily block foreign services without notice. Slack works today, but how long until they decide that Slack facilitates too much discussion about Falungong, Tibet, Xinjiang, Taiwanese independence, or Tiananmen?
That could be the right interpretation. It still struck me as odd to think that no access to Github and Gitlab would prevent anyone from developing software. Others HNers have listed other resources that are difficult to access from China and I have to agree that without documentation it becomes really difficult to solve many issues that one might encounter.
It's not that difficult to set up your own git remote. But if you don't have access to documentation for your libraries it really becomes tough.
There are probably other ways but they lack the network effect of people already being there, and the fact that many major companies use Github to distribute and administer their open source projects.
In the West we have the choice to use the service that gives us all these benefits. Mmm, choices.
I assume that this is in preparation of the upcoming 30th anniversary of the Tiananen massacre, they always make changes to the censorship when a big event is comming up.
For instance for the 2008 olympics many blocked sites were opened up, and conversely they tighten everything up before every National People's Congress elections.
A favorite quote of mine “The best way to build an authoritarian regime is not to indoctrinate someone but to convince them that there is no such things as truth”
Of course they are, it's almost June 4th which is the 30th anniversary of the Tiananmen Square massacre. You know that events is going to be the featured article on one of the Wikipedia editions, or at least be mentioned in the "On this day" section.
The 2018 movie Christopher Robin didn't premiere in China. Moreover, there is no way to legally download or view the said movie in any Chinese online movie providers.
I'd like to be proven wrong. So wise throwtt78oo65, tell me about your facts.
If SNI poisoning does not work for censoring, the whole IP range of service would be blocked instead. You cannot always count on changing IP addresses which would be a cat-and-mouse game.
How SNI is helping there? You still will have the entire website blocked. Clear HTTP would help, as they could block few selected pages and rest would be available. I don't understand this entire movement to HTTPS. Some people think that governments won't dare to block Wikipedia, Amazon or Google? Well, they dared and now you have millions of useful articles blocked because of few offending ones. If I would live in a China, I would prefer censored HTTP access over unavailable HTTPS any day.
> millions of useful articles blocked because of few offending ones
Yes, freedom requires sacrifices. Freedom is not for feeble hearted.
This is an important point. I'd say keeping it in mind is more important for citizens of free countries today, than Chinese.
The sole fact that an argument like this being brought more and more in the West, where it wouldn't fly even 10 years ago, say just how much closer to China the West has become.
Over plain http, how would you solve the problem of a government modifying pages in transit or replacing them entirely with a new version? I suppose you could use the https public key infrastructure with digital signatures so that visitors know whether they're seeing the original...
Well, if you want that property, technically there was NULL encryption algorithm in early HTTPS versions (probably it's not supported now, but there's nothing unusual about it). So you'll have page in cleartext, including all headers (so censors can drop the connection if they don't like it), but you'll have associated checksums and certificates, so changes should be detectable.
This is likely the major reason why China has not yet blocked the major cloud providers. As soon as they allow ESNI/domain fronting, all bets are off as to what China will block.
They explicitly started doing this after Telegram used domain fronting to work around Russian censorship, which caused large chunks of AWS and GCP addresses to be blocked in Russia.
I used the same method to see what is the blocking mechanism in Iran. I tried to connect to www.bbc.com which is blocked in Iran.
The DNS injection is obviously in place. But something strange happened when I checked the SNI filtering. The curl command stopped at "TLSv1.2 (OUT), TLS alert, Client hello (1)" and never exited when I tried to connect to www.bbc.com but with a --connect-to that is not blocked. Nothing strange until now. If SNI blocking is in place, they probably drop all the remaining packets of the connection. The strange thing is that when I try the opposite test and I connect to www.kernel.org (not blocked in Iran, too!) but with www.bbc.com SNI it still stops at TLS client hello.
First I thought they blocked the IP address, but I was able to connect to 212.58.244.210 (the IP address of www.bbc.com) on port 443 with telnet command. So, is Iran's regime using some other blocking mechanism that I'm not aware of? Or am I doing some kind of mistake?
This article was accurate at the time but AFAIK the situation has changed. For context: https://phabricator.wikimedia.org/T208263#5170123 tl;dr There were unintended consequences from refactoring DNS configuration and the situation should be back to “normal” now.
I think they were talking about wikimedia.org also being blocked due to CNAME to wikipedia.org which is now blocked, and they fixed it by CNAME to dyna.wikimedia.org instead, but wikipedia.org is still blocked: https://en.greatfire.org/https/en.wikipedia.org
There's an IPFS mirror of English Wikipedia[0], as well as several others[1], though not the Chinese one[2]. It's currently not a particularly up-to-date mirror, though (from late 2016).
There's a list of gateways[0][1] (hopefully github isn't blocked...) in case these two are also blocked. You could also try just using IPFS directly[2], rather than using an HTTP-to-IPFS gateway, though that's rather more involved and potentially might make you stand out more / seem more suspicious.
In brief, in case you hadn't read about it, the idea behind IPFS is that it's a decentralised, torrent-like content storage system, where content and nodes are "addressed" via their hash (e.g. QmXoyp... above), similarly to git. In order to allow people without the IPFS daemon to access the "IPFS web" there are gateways, like the ones mentioned above.
Yes but it's very very slow (as in on the order of minutes for a page to load). I think this has to do with how Stack Overflow's front-end code is written. There's some non-essential Javascript that is blocked by the Great Firewall which blocks the rest of the page from loading for a long time before the page gives up and displays a banner that some Javascript failed to load.
All programmers I know have software to circumvent the Great Firewall and view it as a necessary condition of their profession.
Many Chinese developers are not fluent English users though. There's a "Chinese StackOverflow" called segmentfault.com, and I often see people asking fellow developers in WeChat groups.
Sadly the main commemorative events only take place in Hong Kong and Taiwan(Macau is not included), but you know the consequence of going against totalitarian government right?.
And you miss the whole point of censorship. It's about making people reluctant to remembering it.
Any attempt at any kind of protest in China is slammed on very swiftly. There have been attempts in the past eg by Falun Gong supporters at Tiananmen Square, but it’s usually all over in minutes as the area is heavily monitored. There is no real independent media, they all have embedded censors, so even when there is a protest it doesn’t get covered. What protests do occur are usually over local issues and foreign media occasionally manage to piece together the story after the fact.
Yes, but another way to look at it, does China's micromanagement of the domestic flow of information result in more or less prosperity for their people? To misquote Hayek: communism is ultimately like telling every car where they should go rather than just putting up street-signs.
* Please add to the debate rather than down-voting. Also Hayek wasn't musing about future autonomous cars, its a metaphor about how deciding what everyone in society should do/think doesn't scale.
does China's micromanagement of the domestic flow of information result in more or less prosperity for their people?
The assumption you're making is that the Chinese government, or any government actually, optimizes for the prosperity of the people. Governments optimize for the prosperity of some people. In China that appears to be the ruling elite. In America it's the wealthy. Here in the UK it's the establishment (which is the existing upper class and the newly wealthy).
I don't think any government genuinely has the interests of all the people at heart, but I am massively pessimistic and cynical so maybe it's me.
It's you. This kind of lazy pessimism strikes me as a peculiar privilege of people who've grown up accustomed to living under a government that, while unavoidably imperfect, does in fact give consideration to the rights and desires of ordinary citizens. They don't take notice of all the ways their government chooses not to oppress them, because they wrongly imagine this is some natural state of things.
On the other hand, immigrants I've spoken to from countries with autocratic regimes, while not starry eyed about the nature of western governments, have no problem explaining why the situation in their countries of origin is much, much worse, and why the government of their new country is much fairer and functional.
I think most governments do want to improve the life of the people, but it simply gets lost in the power struggle. You can't gain power without agreement from the king-makers and that agreement comes with strings attached. You also have to deal with the opposition and at least in American politics, they seem to be quite unreasonable at times. (Regardless which party is the opposition.)
I agree, that's the central question: is a government for promoting the prosperity of it's people or perpetuating itself. Would be nice if they could do both but there are so many examples of governments choosing themselves over the greater good.
China's economy isn't by any stretch of the imagination "communist", but telling every car where to go is perhaps the right solution for the future, with computer science making it feasible and overpopulation making it desirable.
However, I'm not sure that's in any way relevant to freedom of thought and freedom of information.
And Google already does it: take a look at the stories of small towns wishing Google would stop routing people through a certain surface road and not knowing how to make that request. People trust their navigation apps because in general those apps have earned their trust.
In general I am skeptical of "Communism is bad because the government will X" arguments where private industry is capable of doing X in as thorough a way for the average citizen's practical liberty and especially where private industry is already trying X.
Central control is efficient, if you can get the right signals to the controller, and do it fast enough to have a tight feedback loop. Humans alone can do none of this at scale, but modern computing can do the "feedback" part easily. It's now the "signals" and "controller" parts that need to be figured out.
Variable, but generally worse. It's much more tightly censored than Wikipedia: for example, edits are manually reviewed by admins before going live. Also, most community features (talk pages etc) have been removed, so it's constantly gamed to push spam, copy pastes etc and the mods don't really care unless it touches a red line topic like politics.
Considering how politicized wikipedia ( like most of tech industry has gotten in recent years ), doesn't shock me. Jimmy Wales has pretty much come out and said wikipedia will no longer be user driven but ideology driven. Which is one of the reasons the other co-founder of wikipedia has criticized wales and wikipedia.
Wikipedia is great for most generic topics, but for "sensitive" topics, it's pretty much propaganda. Considering how heavily wikipedia is censored by wikipedia itself, maybe a taste of their own medicine will make them change their position, but I doubt it.
Sadly, as more and more people use the internet, it'll be censored more and more by the elites in china, US, russia, EU, etc. What we are seeing is the internet becoming an overt tool of propaganda rather than a platform of discussion or exchange of ideas. Even worse, it seems like there are tons of support for censorship, especially amongst the young "educated" demographics.
If you are making claims like this, you might want to source it. Like a Jimmy wales interview link, a wikipedia link to a "sensitive" topic that have been politized (that does not prove anything but still show that you've done some research).
Also generalization is bad. You might want to visit a skeptic association and follow a course about human cognitive bias (it is helpfull, but it won't "cure" you from them, just make you more self-aware).
Of course, there was always the language barrier, but zh.wikipedia.org could definitely have been as high quality as the English version given the chance.