Y
Hacker News
new
|
ask
|
show
|
jobs
by
jjwiseman
4695 days ago
There are at least 2.5M English wikipedia pages indexed in the crawl:
$ cci_lookup org.wikipedia.en | wc -l 2516956
(See
https://github.com/wiseman/common_crawl_index
, but note that the index is incomplete.)