| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nl 792 days ago
	The "Using python to dump the PDF to text" dramatically underestimates how hard this is. Tables and especially multi-column PDFs often need one-off handling and - worse - you don't know when one is being misparsed until you start getting weird search results. At that point you need to debug your entire search pipeline, which isn't fun!