Hacker News new | ask | show | jobs
by jjwiseman 4695 days ago
There are at least 2.5M English wikipedia pages indexed in the crawl:

  $ cci_lookup org.wikipedia.en | wc -l
  2516956
(See https://github.com/wiseman/common_crawl_index, but note that the index is incomplete.)