Hacker News new | ask | show | jobs
by supersan 3585 days ago
Yes, that is very interesting to me as well. I think Amazon's ElasticSearch can be used for searching. GitHub would be the obvious choice for crawling the source code.

Of course, there are a lot of roadblocks, like how much GitHub allows you to crawl the source code (sometimes you can find huge dumps on legit torrent sites or I don't remember the name of the site but it provides you with S3 buckets with crawled data where the requester has to pay for bandwidth but the data is free).

Then of course you will have to index the source code by language, project, author, date, etc. It's definitely not easy, which could be the reason why there is this huge gap.