| HN Mirror

If I were to make the software, the corpus of PR, licenses, etc. would be the way I go. But "they did it statistically" doesn't answer the question "what is the model?" There are many different statistical models one could use. My other post has a few things we've figured out.

But I'm starting to think a rule-based lexicon isn't out of the question, given these >1 scores on some texts.