Hacker News new | ask | show | jobs
by tway223 962 days ago
This collection has been on the internet for quite a while, likely started around 2015-ish. It is highly duplicated and I suspect the total number is around 4 million books. Still a lot.

The source was from a company named DuXiu, or previously SuperStar. They collaborated with the libraries around China and scanned their collections since early 2000-ish. Before that I think they just bought some junk books from recycling stations based on the quality of early samples.

Many of the books are translated versions of the textbooks from the west (most likely the US) and many are pure political propaganda junk. Some literature and history stuff which were published when censorship wasn't so extreme.

Many of the Chinese tech companies should have access to this collection (especially Baidu for sure) but the books were not censored based on today's standards so I doubt any of them would openly use them not only due to the copyright issue but also the political risks.