Y
Hacker News
new
|
ask
|
show
|
jobs
by
ccgreg
320 days ago
See
https://digitalcorpora.org/corpora/file-corpora/cc-main-2021...
for a set of 8 million PDF files from the web, as seen by a single crawl of Common Crawl.