| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by huytersd 971 days ago
	What is this? The ability for chatGPT to parse text in PDFs? Couldn’t it already do OCR on text in images?

3 comments

jamesdwilson 971 days ago

It's way bigger than that! The new features will basically try to write python on the fly to do anything you want with several types of supported media. Example: OCR all the frames on an uploaded movie.

link

kolinko 971 days ago

Depending on how well it works, it can be much better - accepting longer context sizes etc

link

BoorishBears 971 days ago

Very recently it got vision which can be used for OCR on a single image, but that's massively inefficient and limited compared to what this is likely doing for longer documents

link

lhuser123 971 days ago

I tried using the vision feature for OCR & it was worse than Tesseract. At least for financial documents where you need exact numbers amounts. Will the new PDF feature be better? I’m not so hopeful.

link

BoorishBears 970 days ago

Using the vision feature for OCR is like using an LLM for math: it might work, but we already have a lot of tools that are hyper-optimized for the task.

There is practically no chance the new feature uses vision because that'd be _insanely_ slow and expensive for any reasonably sized document. They're likely using Azure's LayoutLM derived tech to get out text, then using embeddings to answer on questions

link

lhuser123 969 days ago

Will it be better than Tesseract & other OCR tools?

link