Hacker News new | ask | show | jobs
by dredmorbius 2297 days ago
I've discovered page-oriented processing in awk, which is a godsend for parsing PDFs.

See:

https://news.ycombinator.com/item?id=22156456

In the GNU Awk User's Guide:

https://www.gnu.org/software/gawk/manual/html_node/Multiple-...

Tracking column and field widths across page breaks is ... interesting, but more tractable.