Hacker News new | ask | show | jobs
by staplung 1232 days ago
If they can't hit `/*/tree` is there a way to know the URLs of the files?
2 comments

Direct links from crawlable pages
Sure, clone the git repo.
GitHub would not be happy with Google cloning all repos, and many of them at a high frequency, in order to circumvent a robots.txt restriction.
They're clever people, they could just do partial updates (pull instead of clone). I doubt it'd be that much of a strain.