|
|
|
|
|
by rajaravivarma_r
780 days ago
|
|
Is it possible to extract different patterns of text from a PDF document? For example, paragraphs, code blocks, code inlined in paragraphs etc? I tried tesseract but it recognises code blocks as tables. Also there are edge cases like paragraphs starting with an indentation and without an indentation are hard to differentiate. Appreciate any help. |
|