Hacker News new | ask | show | jobs
by yannyu 4476 days ago
I'm not sure if this is what you're looking for, but most often classification/entity extraction is done by using statistical models/machine learning.

You would take a corpus of documents and manually "tag" them with the entities you're looking for and designate that as your training corpus. You would then run that through a machine learning algorithm (such as https://opennlp.apache.org/), and then use the resulting model to process text and identify the entities it was trained on.

1 comments

Yes, I expect to use statistical models/ML. I am more comfortable with Python rather than Java so I will look into the NLTK first. Thank you for your response!