Hacker News new | ask | show | jobs
by IngoBlechschmid 3600 days ago
Sure, the Linux tool "pdftotext" works just fine for this. Two small caveats: ligatures get converted to proper Unicode ligatures and not their ASCII fallback (as one might want or expect) and of course complex mathematical formulas are rendered badly.
1 comments

I've tried both pdftotext and pdf2txt and I remember not being satisfied with either. Neither seem to handle non-ASCII characters very well, but I'll take another look soon though.