Hacker News new | ask | show | jobs
by hbcondo714 3261 days ago
Impressive! Seems like you can't just use PDFBox out of the box (no pun intended) and need to write some custom code specific to the PDF itself per the chart-parser commits[1]

[1] https://github.com/robinhowlett/chart-parser/tree/master/src...

1 comments

Author here; well, PDFBox is good for simple text stripping. If I wanted to print all the text on the PDF, that would be very straightforward and not much code. However, the PDF chart here is in essence a representation of structured data. I wanted to get the content in that format so that I could both serialize to JSON plus have an SDK to boot.