|
|
|
Text extraction
|
|
1 points
by theslay
4533 days ago
|
|
Hi, I'm working on plagiarism detection and I need some help on text extraction from pdfs. I've tried PDFTextStream which really works well for extracting text from pdfs. I need to be able to extract the text into a strutured format where i could query thing like title, chapters,etc. Would appreciate it if I could get pointers to achieving this task. Thanks |
|
If you were to write a blog post about how to structure the extracted text, that's more the HN thing.