| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rovr138 114 days ago

Everything has issues reading the content of PDFs natively. It's a format for displaying/rendering. Not for storing format in a way that's easy to parse for the text/content inside.

Is this one storing text or storing coordinates for where to draw a line for the letter 'l'? Is that an 'l' or a line?

The best way to do this is rendering it to an image and using the image. Either through models that can directly work with the image or OCR'ing the image.

1 comments

jbdamask 114 days ago

Agree. Curious if you’ve played with landing.ai?

link