Hacker News new | ask | show | jobs
by DallaRosa 5047 days ago
I'll withstand my statement: model based on a corpus of PR, scholar, licenses and the like texts. If they are into real statistical NLP.

Or just esthetic rules + word dictionary.

1 comments

If I were to make the software, the corpus of PR, licenses, etc. would be the way I go. But "they did it statistically" doesn't answer the question "what is the model?" There are many different statistical models one could use. My other post has a few things we've figured out.

But I'm starting to think a rule-based lexicon isn't out of the question, given these >1 scores on some texts.