Hacker News new | ask | show | jobs
Amazon has a secret workaround to scrape GitHub for model training (businessinsider.com)
6 points by tardismechanic 730 days ago
3 comments

I would appreciate a formal mechanism to opt out of data collection for Amazon (as well as OpenAI and Microsoft).
[Non-paywalled version]: https://dataconomy.com/2024/06/14/amazon-has-a-secret-way-to...

According to an internal memo obtained by Business Insider, Amazon’s AGI Group worked around Github's 5,000 request/hr/account limits by 'encouraging' its employees to create multiple GitHub accounts and share their access credentials. By leveraging a network of accounts simultaneously, Amazon aims to condense what would have been a multi-year endeavor into a matter of weeks.

Dataconomy: The ethical implications are significant. By soliciting employees to share personal GitHub accounts, Amazon is potentially accessing data without explicit consent from GitHub or the repository owners.

That’s a straight up violation of the terms of service, which seems legally perilous given how easy it’d be to prove:

https://docs.github.com/en/site-policy/github-terms/github-t...