|
|
|
|
|
by ahljoh
3637 days ago
|
|
We would need more context/information about your specific objectives. - document conversion (pdftotext, pdfbox, apache tabula, etc.) - OCR (tesseract, pypdfocr, etc.) - Named-Entity-Recognition (NER) i.e. finding and recognizing entities in text (DBPedia Spotlight, stanford NER via NLTK, spacy) - coreference resolution, dependency parsing (spacy, syntaxnet) |
|