Hacker News new | ask | show | jobs
by modeless 748 days ago
Has anyone else noticed that Google doesn't index source code from GitHub anymore? I could have sworn that you used to be able to search for source code that is on GitHub (e.g. error message strings), but today the index doesn't seem to include the source code at all. The pages with the source code are in the index but you can only find them by file name. Big loss for Google if so.

I also mourn the Google cache. I bet site owners were lobbying to get rid of it, but it's really lame that Google caved after all these years...

3 comments

Not an answer to your question, but Github code search is great: https://github.com/features/code-search
Yeah I'm really glad GitHub finally has functioning code search (how many years did it take?), but I don't like being forced to use it for simple stuff. It's slow and doesn't include source code or documentation that's not on GitHub. I don't always know in advance whether the answer to my query lives in GitHub or not...
It also requires to log in, AFAIK.
"I also mourn the Google cache."

Google cache still works. Too early to mourn.

For example,

https://webcache.googleusercontent.com/search?q=cache:https:...

I do all searches from command line. This allows me to control the SERP, e.g., make own basic HTML metasearch SERP with no JS or CSS. I can put cache links in if I want. I can mix results from different search engines. I can do temporally separated continuation searches to avoid rate limits. And so on.

I can also rewrite URLs with the local forward proxy to use any cache I want. I can also rewrite response bodies.

If Google disables public access to their cache, there are many other options. I have never relied on it.

Yeah it works, but a button leading to it has been removed. It operates more like a hidden feature now.
It's been my experience as well, 80% of websites that had previously content displayed in google results are just not in Google index any more. Google only left titles of the pages in index which won't rank well in search results and won't be able to find the content of those pages.