Hacker News new | ask | show | jobs
by wongarsu 1161 days ago
Libgen is 57% English (17% Russian, 8% German) [1]. By comparison, 10% of Wikipedia is in English [2] (going by number of files and number of articles respectively, both flawed metrics)

Though I feel that's answering a slightly different question. Data used to train currently popular models is mostly English, and the marjority of data in sources popular in the anglosphere is English. Neither of these show whether the majority of available media is English.

https://www.reddit.com/r/libgen/comments/r3lzg2/top_15_langu...

https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia#Co...