Code-wise it is fully automated. But it's slow and it doesn't scale. We have to spin up new servers manually. As OP said, we'll need to make changes for ghuser.io to be scalable. Ideally GitHub adds an API that lists all your contribs. Which is (obviously) not in our hands but we are talking with GitHub.
It's cheeply automated to handle 10 profile requests per day, which is more than what we got in the past few months. So now we're giving it some human help and we'll have to rethink the system.
What is your username? I'll cancel your request, thanks!
Seems like the perfect match for aws lambdas. I’d consider setting the tasks for crawling in to SQS and then trigger the lambdas to go do single crawl functions.
For better control over the throttle and concurrency you can leverage dynamodb... I love it for controlling lambdas but not for storage.
If you need more power than a lambda then you can do a similar process with EC2. Populate the SQS Trigger the aws lambda to turn on EC2 machine. Consider spot instances to save a ton of money.
If you need ideas I’m sure HN readers would be glad to help solution for afar.