Hacker News new | ask | show | jobs
by ccgreg 190 days ago
commoncrawl.org

Our public web dataset goes back to 2008, and is widely used by academia and startups.

1 comments

I always wanted to ask:

- How often is that updated?

- How current is it at any point in time?

- Does it have historical / temporal access i.e. be able to check the history of a page a la The Internet Archive?

- monthly

- it's a historical archive, the concept of "current" is hard to turn into a metric

- not only is our archive historical, it is included in the Internet Archive's wayback machine.