|
|
|
|
|
by lovelearning
3620 days ago
|
|
The file format itself has all the information required to extract text from a rectangular area. Frameworks like PDFBox and iText have supported it from a long time. It's upto users to define what are rows and columns. In most programmatically generated PDFs, this is easy. But in manually typeset PDFs, there are lots of edge cases like variable row heights or column widths, slanted table borders, stuff like that. |
|