Hacker News new | ask | show | jobs
by knute 1228 days ago
/*/tree is only for directory listings. File contents will be under a /blob/ path, e.g. https://github.com/facebook/react/blob/main/AUTHORS, and should be, AFAIK, indexable.

(mandatory disclaimer: I'm a GitHub employee, not speaking on behalf of the company)

2 comments

I asked about this on the support forum a while ago and never got a satisfactory response: https://github.com/community/community/discussions/20958
If they can't hit `/*/tree` is there a way to know the URLs of the files?
Direct links from crawlable pages
Sure, clone the git repo.
GitHub would not be happy with Google cloning all repos, and many of them at a high frequency, in order to circumvent a robots.txt restriction.
They're clever people, they could just do partial updates (pull instead of clone). I doubt it'd be that much of a strain.