Ask HN: Why aren’t the Wayback Machine archived pages indexed by search engines?

Y	Hacker News new \| ask \| show \| jobs

	Ask HN: Why aren’t the Wayback Machine archived pages indexed by search engines?
	4 points by hosa 1936 days ago
	I have had this question for some time now... Not indexing archived pages is contributing to a more shitty and a hassle-full web..

5 comments

DamonHD 1936 days ago

If the archive competed with the originals for clicks that would (a) make a lot of site-owners cross and (b) would be serving stale content to users if the original page is still up and being updated.

link

elliottinvent 1935 days ago

Search engines are designed to give you the best result on the web today.

The Wayback Machine / archive.org is a snapshot in time of the web.

If search engines combined the current web and old web it would be an interesting experiment but possiblly a diff nightmare.

Maybe it’s something that could be a point of differentiation for a new search engine compared to Google.

For anyone wanting to take this on, maybe start with Common Crawl [0]

0. https://commoncrawl.org/the-data/

link

cyberlab 1936 days ago

I used an extension once in Firefox that allows you to view the archived version of any URL (providing the site allowed the WM crawler in their robots.txt). It worked for both working URLs and URLs that 404'd or didn't exist anymore / bitrotted pages.

link

hosa 1936 days ago

Yes I know that extension.. But my purpose is different, for researching and education , there are pages that only exist in the internet archive, and so I do not want to search the same thing two times in two different places..

link

cyberlab 1936 days ago

Yeah but if the site issues a 404, you simply right click and the extension will show you the archived copy. Sadly many sites fail to provide a decent 404 page if a resource doesn't exist, they just do redirect spamming or in the worst case: 503 error[0]

(But I understand your need for text search on these services)

[0] https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503

link

MrCoffee7 1936 days ago

Perhaps this article can help you somewhat: https://www.netforlawyers.com/content/archive-wayback-machin...

link

uberman 1936 days ago

Are you asking technically why or philosophically why?

link

hosa 1936 days ago

Both. Shoot me

link