Hacker News new | ask | show | jobs
by mewpmewp2 1132 days ago
I would've said you should download only archives, but really I think commits are also very important data since that shows the actual changes in the code which would be very useful to train AI to suggest changes to the code.
1 comments

There are valid non-evil reasons for git hosts to want to throttle and put up obstacles toward scraping as well, both via crawlers or 'git clone' or whatever. These are very expensive operations.