| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 1vuio0pswjnm7 1458 days ago
	"This is why its important a service like the Wayback Machine or, even better, Ethereum blockchain exists, to timestamp webpages and media for future observers." Wayback Machine (Internet Archive) uses much data from Common Crawl. GPT-3 was trained with Common Crawl and Wikipedia dumps. Arguably, under this prediction, the "live" web after 2022 will be an automated regurgitation of the web before 2022, e.g., going back to only 2009, the year of Common Crawl's first public archive. (Strangely, there is no archive for 2011.)