| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by delinka 2843 days ago
	Am I understanding correctly that this is a manual process and not currently automated? (And can I cancel my request so you can move on to others for now?)

2 comments

brillout 2843 days ago

Code-wise it is fully automated. But it's slow and it doesn't scale. We have to spin up new servers manually. As OP said, we'll need to make changes for ghuser.io to be scalable. Ideally GitHub adds an API that lists all your contribs. Which is (obviously) not in our hands but we are talking with GitHub.

link

lourot 2843 days ago

It's cheeply automated to handle 10 profile requests per day, which is more than what we got in the past few months. So now we're giving it some human help and we'll have to rethink the system.

What is your username? I'll cancel your request, thanks!

link

social_quotient 2843 days ago

Seems like the perfect match for aws lambdas. I’d consider setting the tasks for crawling in to SQS and then trigger the lambdas to go do single crawl functions.

For better control over the throttle and concurrency you can leverage dynamodb... I love it for controlling lambdas but not for storage.

If you need more power than a lambda then you can do a similar process with EC2. Populate the SQS Trigger the aws lambda to turn on EC2 machine. Consider spot instances to save a ton of money.

If you need ideas I’m sure HN readers would be glad to help solution for afar.

link