Hacker News new | ask | show | jobs
by wfhpw 1935 days ago
The "bookcorpus" is from this paper by Zhu, Kiro, et. al. [0]. You can see the project web page here [1] which indicates you can crawl books from this site [2] to create your own. This repo seeks to replicate the original dataset [3]

[0] https://www.cv-foundation.org/openaccess/content_iccv_2015/p...

[1] https://yknzhu.wixsite.com/mbweb

[2] https://www.smashwords.com/

[3] https://github.com/sgraaf/Replicate-Toronto-BookCorpus