| HN Mirror

What would the expression language for that even look like, given that PDFs are basically "canvas as a service"?

I'm aware there are pdf2html toys, and sometimes they do something reasonable, but just like with web scraping the markup of the target matters a lot and so, too, would the "markup" of the target PDF

Further, just like often it is better to go after the underlying XHR instead of trying to de-React the HTML, I'll offer that when possible it would be far better to try and identify the upstream source of the information in the PDF than trying to reverse engineer a postscript VM