Hacker News new | ask | show | jobs
by got2surf 3014 days ago
This looks really interesting! Two points:

1) A quick side-by-side of a sample Terms/Conditions versus Leaf's summarized version would be helpful. It would help me understand the product more before I install it.

2) What ML/NLP tools did you use for this? It looks like Sumy for Python summarization, along with a specific list of clauses (will, agree, must, etc). When you get a chance, I'd be curious to know more about the technical process.

Also, I noticed that you are stemming words - you may also be interested in lemmatization, which is a slightly more complicated way of converting words into their base forms (like running -> run or ran -> run). Lemmatization also takes into account part of speech context. Given that legal documents are fairly grammatical (I'm assuming?), lemmatization should work well here. I've been fairly happy with Spacy's lemmatization results (https://spacy.io/)

1 comments

Great feedback! 1) That is an awesome idea, I hadn't thought of that. We'll put that together. 2) Right now its sumy/regex/bs4 for our tech. As you can see, it's nothing complex, but we're hoping to had some real ML to warrant our use of buzzwords. The hardest technical challenge was actually working within the Chrome Extension framework, the actual summarization (currently) is fairly straightforward. 3) Spacy lemmatization looks like exactly the next step! Thank you for that. link.