Hacker News new | ask | show | jobs
Find out how many users/repos Github has by a simple search (github.com)
41 points by vashishthajogi 4634 days ago
12 comments

It seems to work with @{any letter}. Can anybody explain to me how GitHub is interpreting this search term?
It does not work with @{any letter}. Try @b or @x - they will search for content by user 'b' (http://github.com/b) and user 'x' (http://github.com/x).
Sorting by "Least recently joined" is interesting. The GitHub team are all there, naturally, but I'm intrigued what "tater" is about. Yehuda Katz is famously user ID 4 but tater shows up as user 611789 through the API.

Other than that, the first 10 pages or so are a real who's who of the Ruby scene in 2007-2008 :-)

Woop. 140th person to join GitHub.

Back then you pretty much had to be invited by a Ruby/Rails person. I got in by submitting a patch to Dr Nic. :)

See, I'm not entirely sure about this date ordering stuff.. because I'm around #80 by that method, but if I use the API, I'm #118. Try http://caius.github.io/github_id/ to see what your actual user ID is, as I think it's more truthful.
So how does this work? From the search its cheat sheet:

"@defunkt Get all repositories from the user defunkt."

Seeing as there _is_ a user called "a: (http://github.com/a) but he doesn't have a single line of code, no repositories and no active issues either; the search breaks somehow and returns the repository, issue and code (LOC?) count of ALL people.

For example trying it with "@b" (or any existing user after the at-sign) does yield the correct results (the respective counts for user b: http://github.com/b). Trying it with someone with no repositories, code and issues OR a user which does not exist (@thisuserdoesnotexist) results in the same behavior.

@a is probably an alias for @all (https://github.com/search?q=%40all&type=Users&ref=searchresu...).

[edit 1] @everybody seems to return every user

[edit 2] @everything seems to return ... everything.

[edit 3] @qdfhsdfjsdqjrekle seems to return everything. Ok. It's a bug. :>

No, they are specifically searching for content by @{username} - refer to the cheat sheet on the search page. @{non-existent-username-or-user-with-no-activity} triggers this behavior.
Wow. Its good to see Linus takes the occasional day off : https://github.com/torvalds
What is this?
In case this gets broken anytime soon, I see:

Repositories: 4,236,263

Code: 962,748,183

Issues: 5,952,195

Users: 4,472,663

At this time (4:48 AM EST, 2013-10-11), minutes after Hacker News changed the parent comment's timestamp from "1 hour ago" to "2 hours ago", these are the updated values:

Repositories: 4,236,957

Code: 962,598,538

Issues: 5,953,084

Users: 4,473,412

So in a space of time probably equal to either 1.5 hours or 2 hours, the numbers changed by these amounts:

Repositories: +694

Code: -149,645

Issues: +889

Users: +749

Repositories: 4,237,352

Code: 962,617,133

Issues: 5,953,624

Users: 4,473,778

Repositories: 4,237,870


Code: 962,651,473


Issues: 5,954,482


Users: 4,474,320

This seems to be valid compared to numbers released in April 2013 [1]. GitHub is growing fast.

[1] https://github.com/blog/1470-five-years

This query is incorrect, because it doesn't include private repos and forks. The headlines on this page https://github.com/search are accurate. As of this comment.

Search more than 4.3M Users

Search more than 8.8M Repositories

Search more than 18.7M Issues

BTW https://github.com/search?q=%40y&type=Users&ref=searchresult... i.e. '@y' also gives similar results :)
Assume this is going to disappear once they wake up in the US...
If I search "@z" I get 13 repos. So why exactly do you think that searching "@a" is yielding all repos?
You could also get away with a search for created:>2007 Not really sure what the "@" is doing.
So now you could scrape the name of every GitHub user and their email if it is public.
Don't need to scrape it, just use their API:

https://api.github.com/users?since=0

And iterate on. They even give you a HTTP header for the next URI:

Link: <https://api.github.com/users?since=135>; rel="next"

There is a rate limit of 60 (per hour), but if you use oauth then it's 5000. See http://developer.github.com/v3/#rate-limiting

Oh they rate limit and only allow access to the first 100 pages of results. Well that is that idea out of the window :P You could probably get around 7000 of them max, barely a scratch.
It'd take you about 48 days :) When I checked that ID of the most recent user was 5663608. The github API will give you 5000 hits an hour if you authenticate your requests. You'd have a load of "not founds" but working up from 1 to 5663608 would get you every public user (no private accounts or banned / deleted).

5663608 / 5000 = just under 48 days worth of non stop API harrasment... maybe that explains why github keeps going down, someone is already trying this out :P

Even if it's not :)