| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by exikyut 3238 days ago

EDIT: I've noticed all the replies and I'd like to acknowledge them. Unfortunately I feel very stupid for not screenshotting what I saw when I searched one hour ago. I now see 62,900 results, and I can load up to page 6. I can't prove that I was not able to load page 2 before, but it's true.

My original comment remains unedited below.

For a concrete demonstration of pathological de-ranking, do a query for "site:web.archive.org".

I get "59,000 results" on page 1, but page 2 will never load!

There are a few results, which proves that a) web.archive.org are not using robots.txt or other blocking techniques, and b) that Google's infrastructure is inhaling content. But it's invisible.

Think about how sad this is - once a site goes dead, it's offline, even though the content is still publicly accessible. If only that context was indexed using a decent search engine.

Practically speaking, I totally acknowledge that archived content is complex to surface; sites can be pulled offline because content needs to be disappeared for any number of reasons, etc. I recognize the general difficulty of getting this right. So I'm not _really_ arguing "if only this were surfaced", because it's unfair to - I'm more saying "hey look, this is what it looks like when something has been completely killed," as a demonstrable and extreme datapoint.

4 comments

leereeves 3238 days ago

Page 2 loads for me, though there's a slight delay.

link

adventured 3238 days ago

Page two for "site:web.archive.org" loads immediately on Google, no delay.

link

jasonlotito 3238 days ago

Page 2 and on loaded just fine for me without any issue.

link

cjhopman 3238 days ago

Your comment says more about you than about Google.

That search works just fine for me.

link