Hacker News new | ask | show | jobs
by adelpozo 497 days ago
it does not have any dependency to a pdf parsing library, correct? That's a cool way to learn to file format and be able to work around weird pdf file. But what was the motivation to not use a library to do the pdf parsing work? is it the case that there is none available? Nice work!
1 comments

Correct, PDFSyntax implements everything at the lowest level. You can ignore the HTML visualization and use it as an API to access PDF objects. Why? Because I started a very small tool as a week-end project and I got hooked reading the PDF Specification so it is becoming a general purpose PDF library for Python. I am not familiar with other libraries but I have the impression that mine implements things that are often overlooked in others, like incremental updates.