|
|
|
|
|
by rw
3176 days ago
|
|
This only uses LDA on your starred repository descriptions, to find topic terms that describe your starred repositories. These topic terms are then used to query the GitHub search API to find matching repositories. The results are then sorted by star count. That is a clever way to make use of a search API like GitHub's. The principled way to do this, though, is to run LDA over all descriptions on GitHub, then use that similarity index to find similar repositories. You could run LDA over code, too. I'll note that there is a cold start problem with this implementation: using LDA on such a small set of short documents will often lead to uninformative topics with words that are too-specific. You need a big corpus to capture e.g. synonym relationships. |
|