| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sankyo 2371 days ago
	Does it work only on books and magazines or would it work on a driver license or ID card as well?

2 comments

crazygringo 2371 days ago

Before OCR'ing it converts to black-and-white using a brightness threshold. Keeping an evenly lit background is particularly important, because otherwise a shadow area can easily fall under the threshold of all-black.

A license or ID will almost certainly have medium-contrast elements in the background that will show up as dark. But if you were able to manipulate the contrast/brightness appropriately in advance, you could probably get it to work.

link

bhanhfo 2371 days ago

Tesseract is optimized for images with white backgrounds. ID cards or movie screenshots do not work well.

link

Certified 2371 days ago

I have used tesseract ocr combined with imagemagick and ffmpeg to great success for video text extraction.

link

beagle3 2370 days ago

Can you list your script/pipeline? I haven't had much success (though, I only ran ffmpeg's internal tesseract OCR[0], no imagemagick processing or any other processing in between)

[0] https://ffmpeg.org/ffmpeg-all.html#ocr

link