> Instead we crawl GitHub's website which is computationally expensive
I'm surprised to hear that you're bottlenecking on CPU time. Could you verify that my understanding is correct? I would've thought your bottleneck would be networking and connectivity as you have to wait for GitHub to process all of the requests.
It uses a very slow non-official GitHub API and so it takes several hours to do the initial crawling of one single profile, and is limited by IP so you need several machines/instances in order to parallelize. We plan to use AWS Fargate for the future. (We thought this future would be much farther away)
Details at https://github.com/AurelienLourot/github-contribs#how-does-i...
We are in talks with GitHub and they know that we are crawling GitHub.