Hacker News new | ask | show | jobs
by giovannibonetti 2303 days ago
Oh, yeah, pdf2json returns font sizes as well. I forgot to mention that.
1 comments

pdf2json font name can be uncorrect sometime as it does only extract them based on a pre-set collection of fonts. I suggest using this fork that fix it :

https://github.com/AXATechLab/pdf2json

Bounding box also can be off with pdf2json. Pdf.js do a better job but have a tendency to no handling some ligature/glyph well, transforming word like finish to "f nish" sometime (eating the i in this case). pdfminer (python) is the best solution yet but a thousand time slower....