Hacker News new | ask | show | jobs
by samarthr1 203 days ago
When building a mini corporate filings digest generator, I very quickly switched to using tesseract over reading the selection layer in the pdf.

Unfortunately it is the most reliable way to get readable text out...

Also does guard against prompt injection via white text eh?