Hacker News new | ask | show | jobs
by roryoldershaw 4634 days ago
So now you could scrape the name of every GitHub user and their email if it is public.
3 comments

Don't need to scrape it, just use their API:

https://api.github.com/users?since=0

And iterate on. They even give you a HTTP header for the next URI:

Link: <https://api.github.com/users?since=135>; rel="next"

There is a rate limit of 60 (per hour), but if you use oauth then it's 5000. See http://developer.github.com/v3/#rate-limiting

Oh they rate limit and only allow access to the first 100 pages of results. Well that is that idea out of the window :P You could probably get around 7000 of them max, barely a scratch.
It'd take you about 48 days :) When I checked that ID of the most recent user was 5663608. The github API will give you 5000 hits an hour if you authenticate your requests. You'd have a load of "not founds" but working up from 1 to 5663608 would get you every public user (no private accounts or banned / deleted).

5663608 / 5000 = just under 48 days worth of non stop API harrasment... maybe that explains why github keeps going down, someone is already trying this out :P

Even if it's not :)