Hacker News new | ask | show | jobs
by da39a3ee 1752 days ago
I was recently looking for a library that takes a few lines of source code as input, and predicts the programming language as output.

That seems like a very tractable machine learning problem, yet all I could find was a single python library which looks nice, but doesn't have much adoption, and requires installing the entirety of tensorflow despite the fact that users just want a trained model and a predict() function.

Why doesn't a popular library like this exist?

1 comments

GitHub's linguist library can be used to identify the programming language of a single file (edit: or of a whole project): https://github.com/github/linguist#single-file
Thanks! My searches completely failed to find that. I can’t use it as a ruby library, but perhaps I can pull out the heuristics.yml and the naive bayes classifier weights to use in another language.