| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yannyu 4476 days ago
	I'm not sure if this is what you're looking for, but most often classification/entity extraction is done by using statistical models/machine learning. You would take a corpus of documents and manually "tag" them with the entities you're looking for and designate that as your training corpus. You would then run that through a machine learning algorithm (such as https://opennlp.apache.org/), and then use the resulting model to process text and identify the entities it was trained on.

1 comments

hakann 4476 days ago

Yes, I expect to use statistical models/ML. I am more comfortable with Python rather than Java so I will look into the NLTK first. Thank you for your response!

link