Hacker News new | ask | show | jobs
by KenanSulayman 2843 days ago
Isn't 200 requests in an hour ... just three requests per minute? What is this service doing?
3 comments

GitHub's API does't provide the exhaustive list of all your contributions. Instead we crawl GitHub's website which is slow.

Details at https://github.com/AurelienLourot/github-contribs#how-does-i...

We are in talks with GitHub and they know that we are crawling GitHub.

> Instead we crawl GitHub's website which is computationally expensive

I'm surprised to hear that you're bottlenecking on CPU time. Could you verify that my understanding is correct? I would've thought your bottleneck would be networking and connectivity as you have to wait for GitHub to process all of the requests.

You're right. I edited my answer.

The GitHub's unofficial API we are using is slow and per IP rate limited. We spin up several servers to have several IPs to circumvent the rate limit.

(GitHub knows that we do that and we are in contact with them.)

If GitHub are okay with you using multiple IPs to get that data then it's not inherently expensive on their side for you to be using this.

Surely a rate-limit exception could be in order, then?

And perhaps you could help them alpha-test a new API endpoint that just so happens to include all the info rolled up as one URL :D

(Hmmmmm.... GraphQL....)

Looks like they're hitting a GitHub rate limit, rather than bottlenecking on CPU.
this is the bottleneck: https://github.com/AurelienLourot/github-contribs

It uses a very slow non-official GitHub API and so it takes several hours to do the initial crawling of one single profile, and is limited by IP so you need several machines/instances in order to parallelize. We plan to use AWS Fargate for the future. (We thought this future would be much farther away)

Maybe it's cloning all GitHub repos of a user to compute activity?
We don't clone repos. Instead we use GitHub's API to get activity info.

E.g. https://api.github.com/repos/aurelienlourot/ghuser.io/contri...