Hacker News new | ask | show | jobs
by ngshiheng 716 days ago
interesting! perhaps cleaning up the older data might help abit here

> since ultimately you’re duplicating a bunch of data and will eventually catch the eye of some GitHub compliance script

I suppose this could also be a concern with git scraping as we are bascially duplicating data through git commits (not trying to imply that one is better or worse). Having that said, I'm not sure if GitHub would be fine with any of these if more people were to do the same at a larger scale

1 comments

What would be interesting is if you could find a way to scrape only the deltas and then somehow reconcile them into the full scrape.