|
|
|
|
|
by ellisd
238 days ago
|
|
The paper makes no mention of Anna’s Archive. I wouldn’t be surprised if DeepSeek took advantage of Anna’s offer granting OCR researchers access to their 7.5 million (350 TB) Chinese non-fiction collection ... which is bigger than Library Genesis. https://annas-archive.org/blog/duxiu-exclusive.html |
|
> We cleaned 860K English and 180K Chinese e-books from Anna’s Archive (Anna’s Archive, 2024) alongside millions of K-12 education exam questions. https://arxiv.org/abs/2403.05525 DeepSeek-VL paper