Hacker News new | ask | show | jobs
by neilv 1308 days ago
Neat. Another use case for which you might want to think about a sample is extracting data from filled PDF forms. (That use case is why I once had to write a PDF parser.)

Since you read&write, maybe also a use case of programmatically filling some form fields in an editable PDF form. Such pre-filling some of the fields for a particular Web site user in a dynamically-modified PDF form they download. But the source PDF form can be hand-crafted and maintained separately, like people often want to do, not generated from scratch by your code.

2 comments

I recently tried pdfplumber [1] to extract tables from (relatively) difficult formatted tables in PDF, and it was a great experience. I can recommend it. Before I ended up using pdfplumber, I tried at least three other PDF packages and they did not work as easily or as expected.

[1]: https://github.com/jsvine/pdfplumber

Fun project story... During the first covid school shutdown my son's day care wanted parents to print a daily screening symptom checklist, take a photo of it and email it to them every morning. This was a tedious process that I automated with PyPDF2 + PDFtk + pypdftk. It's easy to generate your own PDF's but it's harder to take an existing, outdated, non-editable PDF and automatically fill it out.

Eventually I turned it into a website, added AWS API Gateway + Lambda and put the whole thing up for other daycare parents to use. Two weeks later the daycare switched to google forms and my project was not useful anymore.

> it's harder to take an existing, outdated, non-editable PDF and automatically fill it out.

That has been on my wishlist for several years: build a "PDF annotation" service that takes in a PDF that is not an XObject form (e.g. this random example: https://www.dentalworks.com/wp-content/uploads/2021/08/Patie... ) and replace those _____ areas with actual PDF inputs. My handwriting is terrible, and it's a waste of human capital for some poor soul to try and decipher handwriting only to (almost undoubtedly) re-type it into a computer on their end

I am sure we ended up in this situation because people just "File > Print to PDF" from Word or whatever, because knowing that PDF forms exist and then how to use Adobe(R) whatever(tm) to make a real editable PDF is "too much to ask."

I have had about 10% success with Preview.app detecting the lines and allowing me to click on them and type, but having https://notstupidpdf.example.com/www.dentalworks.com/wp-cont... would be much better for humanity